用python的BeautifulSoup分析html

时间：2014-07-10 19:12:38 阅读：265 评论：0 收藏：0 [点我收藏+]

标签：shell

http://www.cnblogs.com/twinsclover/archive/2012/04/26/2471704.html 用python的BeautifulSoup分析html

http://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html Beautiful Soup 中文文档

1) 搜索tag：

find(tagname)        # 直接搜索名为tagname的tag 如：find(‘head‘)
find(list)           # 搜索在list中的tag，如: find([‘head‘, ‘body‘])
find(dict)           # 搜索在dict中的tag，如:find({‘head‘:True, ‘body‘:True})
find(re.compile(‘‘)) # 搜索符合正则的tag, 如:find(re.compile(‘^p‘)) 搜索以p开头的tag
find(lambda)         # 搜索函数返回结果为true的tag, 如:find(lambda name: if len(name) == 1) 搜索长度为1的tag
find(True)           # 搜索所有tag

2) 搜索文字（text）

3) recursive, limit:

from bs4 import BeautifulSoup
import re

doc = [‘<html><head><title>Page title</title></head>‘,
       ‘<body><p id="firstpara" align="center">This is paragraph <b>one</b>.‘,
       ‘<p id="secondpara" align="blah">This is paragraph <b>two</b>.‘,
       ‘</html>‘]
soup = BeautifulSoup(‘‘.join(doc))

print soup.prettify()+"\n"
print soup.findAll(‘b‘)

print soup.findAll(text=re.compile("paragraph"))
print soup.findAll(text=True)
print soup.findAll(text=lambda(x):len(x)<12)

a = soup.findAll(re.compile(‘^b‘))
print [tag.name for tag in a]

print [tag.name for tag in soup.html.findAll()]
print [tag.name for tag in soup.html.findAll(recursive=False)]

print soup.findAll(‘p‘,limit=1)

用python的BeautifulSoup分析html,布布扣,bubuko.com

用python的BeautifulSoup分析html

标签：shell

原文地址：http://dragonball.blog.51cto.com/1459915/1436515

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行