码迷,mamicode.com
首页 > Web开发 > 详细

介绍一下Mojolicious的DOM选择器Mojo::DOM和它的Mojo::UserAgent(比较Web::Scraper)

时间:2014-09-15 15:45:39      阅读:371      评论:0      收藏:0      [点我收藏+]

标签:style   blog   http   color   io   使用   文件   数据   div   

最近正好又需要做页面分析,以前全是用AnyEvent::HTTP和Web::Scraper。这次试了试Mojo::DOM和Mojo::UserAgent。
先说结论,我的试用结论是:如果程序不和web沾边,只是个页面分析或文件处理程序,那还是前者好。否则的话可以考虑Mojo.

先说Mojo::DOM和Mojo::UserAgent的优点:
Mojo::DOM做的这个dom选择器在一些时候是非常方便的
读入HTML以后可以精确定位需要的元素或是用回调的方式遍历。

  1. my $dom = Mojo::DOM->new($html_string);
  2. $dom->find(‘p[id]‘)->each(sub { say shift->{id} });
复制代码

在 配合Mojo::UserAgent使用的时候就更方便了。Mojo::UserAgent有丰富的功能,但如果你不想用那些,你可以就把它当成一个 wget(http client)用。它不但支持同步get也支持非阻塞get网页。而且和Mojo::DOM整合的很好。比如:

  1. my $ua = Mojo::UserAgent->new;
  2. my $title = $tx->res->dom->at(‘head title‘)->text;
复制代码

当把这一切放到Mojolicious web框架里的时候就更美好了,因为都是一个作者写的,整合性就非常好。以前要兴师动众的工作现在2,3行代码就完成了。

以上看着都很美好了,我说些在我看来的缺点。
1. 不支持XPATH。
我很熟悉XPATH,但很不幸,不支持XPATH。虽然很多东西都可以用mojo的方式实现,但我还是能说出一些我常用但没实现的东西。并且我猜测因为 此,效率也会差很多。由于Web::Scraper是用xpath,并且可以用XML::LibXML来解析html/xml,XML::LibXML是 目前所有DOM方式中最快的(libxml2 > expat)。所以我认为一个纯perl写的非xpath方式的DOM选择器的效率是不足以做大规模数据分析的。(仅是猜测)

2. 可能是我的使用习惯,页面复杂的时候我还是更喜欢用Web::Scraper
用过Web::Scraper的人都知道,你需要先用xpath写一个符合某类页面的统一规则,然后用这一整套规则去分析一类页面。页面信息复杂的时候这 一套规则可能几十甚至上百行。而用Mojo::DOM就只能用好多find->each和perl回调函数裹在一起,不方便调试,写页面分析规则的 人还必须得会perl。

3. 没法用Coro::rouse_cb和Coro::rouse_wait了。

  1. my $coro = async {
  2.     http_get "http://www.example.com/", Coro::rouse_cb;
  3.     my ($data, $header) = Coro::rouse_wait;
  4.     print Dumper $header;
  5. };
复制代码

上面的这个可以。下面的这个就不行了。

  1. my $coro = async {
  2.     my $ua = Mojo::UserAgent->new;
  3.     $ua->get(‘http://www.example.com/‘ => Coro::rouse_cb);
  4.     my ($ua2, $tx) = Coro::rouse_wait;
  5.     my $title = $tx->res->dom->at(‘head title‘)->text;
  6.     print "$title\n";
  7. };
复制代码
www.hwmqh.com/gggbdf
www.hwmqh.com/gbdfgfw
www.hwmqh.com/gbdfkhw
www.hwmqh.com/gbdfsh
www.hwmqh.com/gbdfsjxz
www.hwmqh.com/gbdfylsjxz
www.hwmqh.com/gbdfwfm
www.hwmqh.com/gbdfdtkh
www.hwmqh.com/gbdfhy
www.hwmqh.com/gbdfrhkh
www.hwmqh.com/gbdfzdl
www.hwmqh.com/gbdfw
www.hwmqh.com/gbdfdtkmdl
www.hwmqh.com/gbdfglw
www.hwmqh.com/gbdfxjw
www.hwmqh.com/gbdfwtkhzx
www.hwmqh.com/gbdfwtdhkh
www.hwmqh.com/gbdfwkh
www.hwmqh.com/gbdfwthykh
www.hwmqh.com/gbdftgy
www.hwmqh.com/gbdfylwz
www.hwmqh.com/gbdfzmzc
www.hwmqh.com/gbdfbjl
www.hwmqh.com/gbdfylyq
www.hwmqh.com/mdgbdfrqrh
www.hwmqh.com/gbdfmdyjm
www.hwmqh.com/mdgbdfaqm
www.hwmqh.com/gbdfkmdl
www.hwmqh.com/gbdfxwz
www.hwmqh.com/gbdfwtzx
www.hwmqh.com/gbdfdms
www.hwmqh.com/gbdfzc
www.hwmqh.com/gbdfsy
www.hwmqh.com/gbdfwzx
www.hwmqh.com/gbdfzj
www.hwmqh.com/gbdfdz
www.rhliv.com/gbdf
www.rhliv.com/gbdfkh
www.rhliv.com/gbdfylw
www.rhliv.com/gbdfyl
www.rhliv.com/gbdfhykh
www.rhliv.com/1659988_comgbdf
www.rhliv.com/gbdfdhtz
www.rhliv.com/gbdfylpt
www.rhliv.com/gbdfshy
www.rhliv.com/gbdfzxkh
www.rhliv.com/gbdfgw
www.rhliv.com/gbdfwt
www.rhliv.com/gbdfylc
www.rhliv.com/gbdfdl
www.rhliv.com/gbdfxc
www.rhliv.com/gbdfyldl
www.rhliv.com/gbdfkhbl
www.rhliv.com/gbdfylkh
www.rhliv.com/gbylgbdf
www.rhliv.com/gggbdfylc
www.rhliv.com/gbdfsjzmdl
www.rhliv.com/gbdfylfl
www.rhliv.com/gbdfzmnyq
www.rhliv.com/gbdfyj
www.rhliv.com/gbdfxmf
www.rhliv.com/szdmdgbdf
www.rhliv.com/mdgbdf
www.rhliv.com/gbdfdhkh
www.rhliv.com/gbdfdlkh
www.rhliv.com/gbdfwtkh
www.rhliv.com/gbdfkh1581260
www.rhliv.com/gbdfylhbwz
www.rhliv.com/gbdfyq
www.rhliv.com/sygbdfyl
www.rhliv.com/gbdfylzmyq
www.rhliv.com/gbdfylyflm
www.rhliv.com/gbdfylcznl
www.rhliv.com/gbdfwz
www.rhliv.com/gbdftz
www.rhliv.com/gbdfdh
www.rhliv.com/gbdfsj
www.rhliv.com/gggbdf
www.rhliv.com/gbdfgfw
www.rhliv.com/gbdfkhw
www.rhliv.com/gbdfsh
www.rhliv.com/gbdfsjxz
www.rhliv.com/gbdfylsjxz
www.rhliv.com/gbdfwfm
www.rhliv.com/gbdfdtkh
www.rhliv.com/gbdfhy
www.rhliv.com/gbdfrhkh
www.rhliv.com/gbdfzdl
www.rhliv.com/gbdfw
www.rhliv.com/gbdfdtkmdl
www.rhliv.com/gbdfglw
www.rhliv.com/gbdfxjw
www.rhliv.com/gbdfwtkhzx
www.rhliv.com/gbdfwtdhkh
www.rhliv.com/gbdfwkh
www.rhliv.com/gbdfwthykh
www.rhliv.com/gbdftgy
www.rhliv.com/gbdfylwz
www.rhliv.com/gbdfzmzc
www.rhliv.com/gbdfbjl
www.rhliv.com/gbdfylyq
www.rhliv.com/mdgbdfrqrh
www.rhliv.com/gbdfmdyjm
www.rhliv.com/mdgbdfaqm
www.rhliv.com/gbdfkmdl
www.rhliv.com/gbdfxwz
www.rhliv.com/gbdfwtzx
www.rhliv.com/gbdfdms
www.rhliv.com/gbdfzc
www.rhliv.com/gbdfsy
www.rhliv.com/gbdfwzx
www.rhliv.com/gbdfnyqb
www.rhliv.com/gbdfzj
www.rhliv.com/gbdfdz
www.bbilo.com/gbdf
www.bbilo.com/gbdfkh
www.bbilo.com/gbdfylw
www.bbilo.com/gbdfyl
www.bbilo.com/gbdfhykh
www.bbilo.com/1659988_comgbdf
www.bbilo.com/gbdfylpt
www.bbilo.com/gbdfshy
www.bbilo.com/gbdfzxkh
www.bbilo.com/gbdfgw
www.bbilo.com/gbdfwt
www.bbilo.com/gbdfylc
www.bbilo.com/gbdfdl
www.bbilo.com/gbdfxc
www.bbilo.com/gbdfyldl
www.bbilo.com/gbdfkhbl
www.bbilo.com/gbdfylkh
www.bbilo.com/gggbdfylc
www.bbilo.com/gbdfsjzmdl
www.bbilo.com/gbdfylfl
www.bbilo.com/gbdfzmnyq
www.bbilo.com/gbdfyj
www.bbilo.com/gbdfxmf
www.bbilo.com/szdmdgbdf
www.bbilo.com/mdgbdf
www.bbilo.com/gbdfdhkh
www.bbilo.com/gbdfwtkh
www.bbilo.com/gbdfkh1581260
www.bbilo.com/gbdfylhbwz
www.bbilo.com/gbdfyq
www.bbilo.com/gbdfylzmyq
www.bbilo.com/gbdfylyflm
www.bbilo.com/gbdfylcznl
www.bbilo.com/gbdfwz
www.bbilo.com/gbdftz
www.bbilo.com/gbdfdh
www.bbilo.com/gbdfsj
www.bbilo.com/gggbdf
www.bbilo.com/gbdfgfw
www.bbilo.com/gbdfkhw
www.bbilo.com/gbdfsh
www.bbilo.com/gbdfsjxz
www.bbilo.com/gbdfylsjxz
www.bbilo.com/gbdfwfm
www.bbilo.com/gbdfhy
www.bbilo.com/gbdfzdl
www.bbilo.com/gbdfw
www.bbilo.com/gbdfdtkmdl
www.bbilo.com/gbdfglw
www.bbilo.com/gbdfxjw
www.bbilo.com/gbdfwtkhzx
www.bbilo.com/gbdfwtdhkh
www.bbilo.com/gbdfwkh
www.bbilo.com/gbdfwthykh
www.bbilo.com/gbdftgy
www.bbilo.com/gbdfylwz

 

介绍一下Mojolicious的DOM选择器Mojo::DOM和它的Mojo::UserAgent(比较Web::Scraper)

标签:style   blog   http   color   io   使用   文件   数据   div   

原文地址:http://www.cnblogs.com/perl2014/p/3972894.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!