标签:爬取youdao翻译
1.先把需要用的url和queryString以及报头准备好
<pre>url = http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule&sessionFrom
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
data={
"i":"python%0A",
"from":"AUTO",
"to":"AUTO",
"smartresult":"dict",
"client":"fanyideskweb",
"salt":"1506836328053",
"sign":"945d7dd737a4fb4327914ff3f3bf7d90",
"doctype":"json",
"version":"2.1",
"keyfrom":"fanyi.web",
"action":"FY_BY_ENTER",
"typoResult":"true"
}
</pre>
2.然后就一早晨跟有道网页翻译开始无休止的战斗,被一次次地教做人。。。。
下面是亲生的代码:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import urllib
import urllib2
##url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&sessionFrom="
下面是用上面这个Url进行操作返回的,感觉不全面。
##>>>
##Enter the word:中国
##{"type":"ZH_CN2EN","errorCode":0,"elapsedTime":0,"translateResult":[[{"src":"中国","tgt":"China"}]],"smartResult":{"type":1,"entries":["","China"]}}
<font color=red>#不能用url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule&sessionFrom="这个url,这个我怀疑是onlyonce的意思</font>
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&sessionFrom"
这个是用另一个url进行操作返回的结果,能稍微全面一点。
##>>>
##Enter the word:python
##{"type":"EN2ZH_CN","errorCode":0,"elapsedTime":1,"translateResult":[[{"src":"python","tgt":"python"}]],"smartResult":{"type":1,"entries":["","n. 巨蟒;大蟒","n. (法)皮东(人名)"]}}
##head="Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
queryword = raw_input("Enter the word:")
##data = {}
##data["i"] = queryword
##data["from"]="AUTO"
##data["to"]="AUTO"
##data["smartresult"]="dict"
##data["client"]="fanyideskweb"
##data["salt"]="502865709143"
##data["sign"]="e7b725d55dd02ab7b3a17c44170950ad"
##data["doctype"]="json"
##data["version"]="2.1"
##data["keyfrom"]="fanyi.web"
##data["action"]="FY_BY_CLlCKBUTTON"
##data["typoResult"]="true"
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
}
data={
"i":queryword,
"from":"AUTO",
"to":"AUTO",
"smartresult":"dict",
"client":"fanyideskweb",
"salt":"1506836328053",
"sign":"945d7dd737a4fb4327914ff3f3bf7d90",
"doctype":"json",
"version":"2.1",
"keyfrom":"fanyi.web",
"action":"FY_BY_ENTER",
"typoResult":"true"
}
data = urllib.urlencode(data).encode("utf-8")
#request = urllib2.Request(url,data,header)
request = urllib2.Request(url,data=data,headers=header)
##response = urllib2.urlopen(request)
html=urllib2.urlopen(request).read().decode("utf-8")
#html = urllib2.urlopen(request).read().decode("utf-8")
print html
下面是网上找的代码:
# -*- coding: utf-8 -*-
import urllib
import urllib2
import json
#Request URL
while 1:
i = raw_input("请输入要翻译的文字:")
if i==‘!q‘:
print("退出")
break
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&sessionFrom="
data ={}
#方案一
#head={}
#head["User-Agent"]="Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"
#方案二
head="Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"
# From Data 全部属性
data["i"]=i
data["from"]="AUTO"
data["to"]="AUTO"
data["smartresult"]="dict"
data["client"]="fanyideskweb"
data["salt"]="502865709143"
data["sign"]="e7b725d55dd02ab7b3a17c44170950ad"
data["doctype"]="json"
data["version"]="2.1"
data["keyfrom"]="fanyi.web"
data["action"]="FY_BY_CLlCKBUTTON"
data["typoResult"]="true"
#转码
data =urllib.urlencode(data).encode("utf-8")
#打开链接
#方案一
#req = urllib.request.Request(url,data,head)#Request设置
#方案二
req = urllib2.Request(url,data)
req.add_header("User-Agent",head)
response = urllib2.urlopen(req)
#转为Unicode
html=response.read().decode("utf-8") #输出为json格式
#json文件读取
target = json.loads(html)
#最终字典列表输出
print(target["translateResult"][0][0]["tgt"])
总结。。。。。。。。
一个初学者的血泪,忙活了一早上,查各种资料,各种改代码,最后被一个小写的o给干翻了。。。,有"="跟没有“=”这货是不一样的。心酸。
心中万马奔腾,但是这个X还是要装的。
本文出自 “phize” 博客,请务必保留此出处http://12756301.blog.51cto.com/12746301/1970087
标签:爬取youdao翻译
原文地址:http://12756301.blog.51cto.com/12746301/1970087