标签:
# -*- coding: utf-8 -*-
import urllib.request
import re
url=‘http://s.weibo.com/weibo/%25E9%25BE%2599%25E9%25BA%2592&Refer=STopic_box‘
urlfile=urllib.request.urlopen(url).read()
urlfile=urlfile.decode(‘UTF-8‘)
r1=re.compile(‘[\u4e00-\u9fa5]{2,4}‘) #匹配文字
wordList=re.findall(r1,urlfile)
print (wordList)
标签:
原文地址:http://www.cnblogs.com/pkmnexp/p/4694884.html