码迷,mamicode.com
首页 > Web开发 > 详细

爬取https://www.parenting.com/baby-names/boys/earl网站top10男女生名字及相关信息

时间:2019-12-21 20:23:55      阅读:100      评论:0      收藏:0      [点我收藏+]

标签:findall   图片   源代码   wrapper   src   rom   -name   top   ram   

爬取源代码如下:

import requests
import bs4
from bs4 import BeautifulSoup
import re
import pandas as pd
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding=‘gb18030‘)

lilist=[]

r=requests.get(‘https://www.parenting.com/baby-names/boys/earl‘)
soup=BeautifulSoup(r.text,"lxml")
soup= soup.find_all(‘a‘,href=True)
for i in soup:
    if ‘https://www.parenting.com/pregnancy/baby-names/baby-boy-names/‘ in str(i)or‘https://www.parenting.com/pregnancy/baby-names/girl-baby-names/‘ in str(i):
        lilist.append(i.get("href"))
lilist1=[]
results1=[]
results=[]
results2=[]

for i in list(set(lilist)):
    r=requests.get(i)
    soup=BeautifulSoup(r.text,"lxml")
   
 
    Source=soup.find_all(‘p‘)
    Source=soup.find_all(attrs={‘class‘: ‘description‘})
   
    results0 = re.findall(‘<h4>(.*?)</h4>‘, r.text)
    for c in results0:
        if c!=‘‘:
            lilist1.append(c)
    #print(lilist1)
    #lilist1=[]
    pattern = re.compile(‘<p><strong>Origin:</strong>\s(.*?)</p>‘, re.S)
    results += re.findall(pattern, str(Source))
      
    pattern1 = re.compile(‘<p><strong>Meaning:</strong>\s(.*?)</p>‘, re.S)
    results1 += re.findall(pattern1, str(Source))
    pattern2 = re.compile("<p><strong>Why it’s big:</strong>\s(.*?)</p>", re.S)
    results2 += re.findall(pattern2, str(Source))
   

   
print(lilist1)
print(results1)
print(results)
print(results2)
data = {
    ‘EnName‘:lilist1,
    ‘Meaning‘:results1,
    ‘Origin‘:results,
    ‘Description‘:results2
}
frame = pd.DataFrame(data)
frame.to_csv(‘wt10.csv‘,encoding="gb18030")
#print(results2)
 csv文件截图:
 
技术图片
 
 
 

爬取https://www.parenting.com/baby-names/boys/earl网站top10男女生名字及相关信息

标签:findall   图片   源代码   wrapper   src   rom   -name   top   ram   

原文地址:https://www.cnblogs.com/c1q2s3/p/12078047.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!