python爬虫编写英译中小程序

时间：2019-03-18 13:41:14 阅读：235 评论：0 收藏：0 [点我收藏+]

标签：获得 beautiful finally port nta roi python爬虫 ict dict

1.选择一个翻译页面，我选择的是有道词典（http://dict.youdao.com）

2.随便输入一个英语单词进行翻译，然后查看源文件，找到翻译后的内容所在的位置，看它在什么标签里

技术图片

3.开始编写程序

（1）首先引入requests库跟BeautifulSoup库

（2）更改请求头，防止被页面发现是爬虫，可以在审查元素里找

（3）确定URL，在有道是 http://dict.youdao.com/w/%s/#keyfrom=dict2.top

（4）开始写简单的程序，主要内容就三行

第一步：r = requests.get(url=‘ ‘,headers=)

用requests向页面发出请求，事先写好相应的请求头和URL

第二步：soup = BeautifulSoup(r.text,"lxml")

用BeautifulSoup把获得的text文件，转化为HTML格式

第三步：s = soup.find(class_=‘trans-container‘)(‘ul‘)[0](‘li‘)

.find()方法用于寻找匹配的信息，class、ul、li是所在的标签，这一步根据不同的内容有所不同，

根据源文件相应改变，[0]文本所在位置

（5）进行优化，加入try...except...finally

4.完整程序

 1 import requests
 2 from bs4 import BeautifulSoup
 3 
 4 word = input("Enter a word (enter ‘q‘ to exit):")
 5 header ={‘User-agent‘:‘Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Mobile Safari/537.36‘}
 6 while word !=‘q‘:
 7     try:
 8         r = requests.get(url=‘http://dict.youdao.com/w/%s/#keyfrom=dict2.top‘%word,headers=header)
 9 
10         soup = BeautifulSoup(r.text,"lxml")
11     
12         s = soup.find(class_=‘trans-container‘)(‘ul‘)[0](‘li‘)
13 
14         for item in s:
15             if item.text:
16                 print(item.text)
17         print(‘=‘*40+‘\n‘)
18     except Exception:
19         print(‘Sorry,there is a error!\n‘)
20     finally:
21         word =input("Enter a word (enter ‘q‘ to exit):")

python爬虫编写英译中小程序

标签：获得 beautiful finally port nta roi python爬虫 ict dict

原文地址：https://www.cnblogs.com/LIAN8/p/10551556.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行