python-使用内置库pytesseract实现图片验证码的识别

时间：2019-11-06 13:02:20 阅读：137 评论：0 收藏：0 [点我收藏+]

标签：大于 hold 百度网盘 port baidu 文件 file pat 一个

环境准备：

1、安装Tesseract模块

git文档地址：https://digi.bib.uni-mannheim.de/tesseract/

　百度网盘下载地址：

　　链接：https://pan.baidu.com/s/16RoJ19WynWOKI4Zpr0bKzA
　　提取码：5hst

下载后右击安装即可

2、配置环境变量：

　　编辑系统变量里面 path，添加下面的安装路径：D:\Program Files\Tesseract-OCR(填写自己的实际安装路径)

3、安装python的第三方库：　　

　　pip install pillow #一个python的图像处理库，pytesseract依赖
　　pip install pytesseract

4、修改pytesseract.py文件，指定tesseract.exe安装路径

技术图片

编辑pytesseract.py文件(此步骤必须做，否则运行代码时会报错)：

tesseract_cmd = ‘D:\Program Files\Tesseract-OCR‘

代码实现

验证码识别方法之一，简单验证码，代码可直接使用

import requests
from PIL import Image
import pytesseract

# 验证码地址
url = "http://cloud.xxxx.com/checkCode?0.7337270680854053"
response = requests.get(url).content
#将图片写入文件
with open(‘test.png‘,‘wb‘) as f:
    f.write(response)
#识别验证码
#第一步：通过内置模块PIL打开文件
image = Image.open(‘test.png‘)
image = image.convert(‘L‘)  #转化为灰度图
threshold = 160             #设定的二值化阈值
table = []                  #table是设定的一个表，下面的for循环可以理解为一个规则，小于阈值的，就设定为0，大于阈值的，就设定为1
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

image = image.point(table,‘1‘)  #对灰度图进行二值化处理，按照table的规则（也就是上面的for循环）
image.show()
result = pytesseract.image_to_string(image) #对去噪后的图片进行识别
print(‘图片内容为:‘,result)

python-使用内置库pytesseract实现图片验证码的识别

标签：大于 hold 百度网盘 port baidu 文件 file pat 一个

原文地址：https://www.cnblogs.com/fppblog/p/11804196.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行