标签:.net https hub html target selenium lib ted urllib2
urllib
urllib2
Beautiful Soup
http://www.crummy.com/software/BeautifulSoup/
lxml
http://lxml.de
HTQL
http://htql.net/
Scrapy
http://scrapy.org/
Mechanize
http://wwwsearch.sourceforge.net/mechanize/
PyQuery
http://pythonhosted.org/pyquery/index.html
requests
http://docs.python-requests.org/en/latest/
selenium
http://selenium-python.readthedocs.org/en/latest/
补上
httplib
httplib2
再补充一个
Ghost.py
https://github.com/jeanphix/Ghost.py
再加上一个多线程或多进程+队列
代理访问。
标签:.net https hub html target selenium lib ted urllib2
原文地址:http://www.cnblogs.com/yu9347/p/6764399.html