标签:style blog http color io 使用 文件 数据 div
最近正好又需要做页面分析,以前全是用AnyEvent::HTTP和Web::Scraper。这次试了试Mojo::DOM和Mojo::UserAgent。
先说结论,我的试用结论是:如果程序不和web沾边,只是个页面分析或文件处理程序,那还是前者好。否则的话可以考虑Mojo.
先说Mojo::DOM和Mojo::UserAgent的优点:
Mojo::DOM做的这个dom选择器在一些时候是非常方便的
读入HTML以后可以精确定位需要的元素或是用回调的方式遍历。
在 配合Mojo::UserAgent使用的时候就更方便了。Mojo::UserAgent有丰富的功能,但如果你不想用那些,你可以就把它当成一个 wget(http client)用。它不但支持同步get也支持非阻塞get网页。而且和Mojo::DOM整合的很好。比如:
当把这一切放到Mojolicious web框架里的时候就更美好了,因为都是一个作者写的,整合性就非常好。以前要兴师动众的工作现在2,3行代码就完成了。
以上看着都很美好了,我说些在我看来的缺点。
1. 不支持XPATH。
我很熟悉XPATH,但很不幸,不支持XPATH。虽然很多东西都可以用mojo的方式实现,但我还是能说出一些我常用但没实现的东西。并且我猜测因为
此,效率也会差很多。由于Web::Scraper是用xpath,并且可以用XML::LibXML来解析html/xml,XML::LibXML是
目前所有DOM方式中最快的(libxml2 >
expat)。所以我认为一个纯perl写的非xpath方式的DOM选择器的效率是不足以做大规模数据分析的。(仅是猜测)
2. 可能是我的使用习惯,页面复杂的时候我还是更喜欢用Web::Scraper
用过Web::Scraper的人都知道,你需要先用xpath写一个符合某类页面的统一规则,然后用这一整套规则去分析一类页面。页面信息复杂的时候这
一套规则可能几十甚至上百行。而用Mojo::DOM就只能用好多find->each和perl回调函数裹在一起,不方便调试,写页面分析规则的
人还必须得会perl。
3. 没法用Coro::rouse_cb和Coro::rouse_wait了。
上面的这个可以。下面的这个就不行了。
www.hwmqh.com/gggbdf www.hwmqh.com/gbdfgfw www.hwmqh.com/gbdfkhw www.hwmqh.com/gbdfsh www.hwmqh.com/gbdfsjxz www.hwmqh.com/gbdfylsjxz www.hwmqh.com/gbdfwfm www.hwmqh.com/gbdfdtkh www.hwmqh.com/gbdfhy www.hwmqh.com/gbdfrhkh www.hwmqh.com/gbdfzdl www.hwmqh.com/gbdfw www.hwmqh.com/gbdfdtkmdl www.hwmqh.com/gbdfglw www.hwmqh.com/gbdfxjw www.hwmqh.com/gbdfwtkhzx www.hwmqh.com/gbdfwtdhkh www.hwmqh.com/gbdfwkh www.hwmqh.com/gbdfwthykh www.hwmqh.com/gbdftgy www.hwmqh.com/gbdfylwz www.hwmqh.com/gbdfzmzc www.hwmqh.com/gbdfbjl www.hwmqh.com/gbdfylyq www.hwmqh.com/mdgbdfrqrh www.hwmqh.com/gbdfmdyjm www.hwmqh.com/mdgbdfaqm www.hwmqh.com/gbdfkmdl www.hwmqh.com/gbdfxwz www.hwmqh.com/gbdfwtzx www.hwmqh.com/gbdfdms www.hwmqh.com/gbdfzc www.hwmqh.com/gbdfsy www.hwmqh.com/gbdfwzx www.hwmqh.com/gbdfzj www.hwmqh.com/gbdfdz www.rhliv.com/gbdf www.rhliv.com/gbdfkh www.rhliv.com/gbdfylw www.rhliv.com/gbdfyl www.rhliv.com/gbdfhykh www.rhliv.com/1659988_comgbdf www.rhliv.com/gbdfdhtz www.rhliv.com/gbdfylpt www.rhliv.com/gbdfshy www.rhliv.com/gbdfzxkh www.rhliv.com/gbdfgw www.rhliv.com/gbdfwt www.rhliv.com/gbdfylc www.rhliv.com/gbdfdl www.rhliv.com/gbdfxc www.rhliv.com/gbdfyldl www.rhliv.com/gbdfkhbl www.rhliv.com/gbdfylkh www.rhliv.com/gbylgbdf www.rhliv.com/gggbdfylc www.rhliv.com/gbdfsjzmdl www.rhliv.com/gbdfylfl www.rhliv.com/gbdfzmnyq www.rhliv.com/gbdfyj www.rhliv.com/gbdfxmf www.rhliv.com/szdmdgbdf www.rhliv.com/mdgbdf www.rhliv.com/gbdfdhkh www.rhliv.com/gbdfdlkh www.rhliv.com/gbdfwtkh www.rhliv.com/gbdfkh1581260 www.rhliv.com/gbdfylhbwz www.rhliv.com/gbdfyq www.rhliv.com/sygbdfyl www.rhliv.com/gbdfylzmyq www.rhliv.com/gbdfylyflm www.rhliv.com/gbdfylcznl www.rhliv.com/gbdfwz www.rhliv.com/gbdftz www.rhliv.com/gbdfdh www.rhliv.com/gbdfsj www.rhliv.com/gggbdf www.rhliv.com/gbdfgfw www.rhliv.com/gbdfkhw www.rhliv.com/gbdfsh www.rhliv.com/gbdfsjxz www.rhliv.com/gbdfylsjxz www.rhliv.com/gbdfwfm www.rhliv.com/gbdfdtkh www.rhliv.com/gbdfhy www.rhliv.com/gbdfrhkh www.rhliv.com/gbdfzdl www.rhliv.com/gbdfw www.rhliv.com/gbdfdtkmdl www.rhliv.com/gbdfglw www.rhliv.com/gbdfxjw www.rhliv.com/gbdfwtkhzx www.rhliv.com/gbdfwtdhkh www.rhliv.com/gbdfwkh www.rhliv.com/gbdfwthykh www.rhliv.com/gbdftgy www.rhliv.com/gbdfylwz www.rhliv.com/gbdfzmzc www.rhliv.com/gbdfbjl www.rhliv.com/gbdfylyq www.rhliv.com/mdgbdfrqrh www.rhliv.com/gbdfmdyjm www.rhliv.com/mdgbdfaqm www.rhliv.com/gbdfkmdl www.rhliv.com/gbdfxwz www.rhliv.com/gbdfwtzx www.rhliv.com/gbdfdms www.rhliv.com/gbdfzc www.rhliv.com/gbdfsy www.rhliv.com/gbdfwzx www.rhliv.com/gbdfnyqb www.rhliv.com/gbdfzj www.rhliv.com/gbdfdz www.bbilo.com/gbdf www.bbilo.com/gbdfkh www.bbilo.com/gbdfylw www.bbilo.com/gbdfyl www.bbilo.com/gbdfhykh www.bbilo.com/1659988_comgbdf www.bbilo.com/gbdfylpt www.bbilo.com/gbdfshy www.bbilo.com/gbdfzxkh www.bbilo.com/gbdfgw www.bbilo.com/gbdfwt www.bbilo.com/gbdfylc www.bbilo.com/gbdfdl www.bbilo.com/gbdfxc www.bbilo.com/gbdfyldl www.bbilo.com/gbdfkhbl www.bbilo.com/gbdfylkh www.bbilo.com/gggbdfylc www.bbilo.com/gbdfsjzmdl www.bbilo.com/gbdfylfl www.bbilo.com/gbdfzmnyq www.bbilo.com/gbdfyj www.bbilo.com/gbdfxmf www.bbilo.com/szdmdgbdf www.bbilo.com/mdgbdf www.bbilo.com/gbdfdhkh www.bbilo.com/gbdfwtkh www.bbilo.com/gbdfkh1581260 www.bbilo.com/gbdfylhbwz www.bbilo.com/gbdfyq www.bbilo.com/gbdfylzmyq www.bbilo.com/gbdfylyflm www.bbilo.com/gbdfylcznl www.bbilo.com/gbdfwz www.bbilo.com/gbdftz www.bbilo.com/gbdfdh www.bbilo.com/gbdfsj www.bbilo.com/gggbdf www.bbilo.com/gbdfgfw www.bbilo.com/gbdfkhw www.bbilo.com/gbdfsh www.bbilo.com/gbdfsjxz www.bbilo.com/gbdfylsjxz www.bbilo.com/gbdfwfm www.bbilo.com/gbdfhy www.bbilo.com/gbdfzdl www.bbilo.com/gbdfw www.bbilo.com/gbdfdtkmdl www.bbilo.com/gbdfglw www.bbilo.com/gbdfxjw www.bbilo.com/gbdfwtkhzx www.bbilo.com/gbdfwtdhkh www.bbilo.com/gbdfwkh www.bbilo.com/gbdfwthykh www.bbilo.com/gbdftgy www.bbilo.com/gbdfylwz
介绍一下Mojolicious的DOM选择器Mojo::DOM和它的Mojo::UserAgent(比较Web::Scraper)
标签:style blog http color io 使用 文件 数据 div
原文地址:http://www.cnblogs.com/perl2014/p/3972894.html