标签:soup color writer 爬取 iter 部分 保存 英文名 attr
爬取某网站名字的详细信息
部分代码
# 翻页 # 每页50条数据 pageNum = 0 if counts % 50 == 0: pageNum = counts / 50 else: pageNum = counts // 50 + 1 while page <= pageNum: page += 1 url = url + "/page/{}/".format(page) print("========第{}页============".format(page)) get_contents(url, page)
获取名细
def get_detail(url): html = requests.get(url, headers=headers, verify=False) soup = BeautifulSoup(html.text, "lxml") name = soup.find(‘div‘, attrs={‘class‘: ‘single_baby_name_title‘}).find(‘h1‘).text # 获取name s = soup.find(‘div‘, attrs={‘class‘: ‘single_baby_name_description‘}).find_all(‘span‘) # 获取Meaning Meaning = s[0].text # 获取Meaning Gender = s[1].text # 获取Gender Origin = s[3].text # 获取Origin
保存到csv
# 保存数据到csv with open("baby_name.csv", ‘a+‘, encoding="utf-8-sig", newline=‘‘) as f: csv_writer = csv.writer(f) csv_writer.writerow([name, Meaning, Gender, Origin]) print("=========保存数据成功==========")
实验中慢慢摸索,总结出不少东西
标签:soup color writer 爬取 iter 部分 保存 英文名 attr
原文地址:https://www.cnblogs.com/llbb/p/12081491.html