码迷,mamicode.com
首页 > 其他好文 > 详细

不写代码的爬虫

时间:2019-12-24 12:16:03      阅读:85      评论:0      收藏:0      [点我收藏+]

标签:view   info   read   案例   src   regex   parent   tor   数据采集   

不写代码的爬虫,鼠标直接点一点,数据哗哗就来了,采集数据从来没有这么轻松过,对很多不懂代码编程的销售人员、网络运营、市场运营、网络编辑、SEO等等都可以轻松采集常见的大多数网站数据

 

 

博客园前5页话题数据采集案例,

技术图片

 

 特此记录下,以备不时之需

{"_id":"cnblogs","startUrl":["https://www.cnblogs.com/#p[1-5:1]"],"selectors":[{"id":"blog","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.post_item","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["blog"],"selector":"a.titlelnk","multiple":false,"regex":"","delay":0},{"id":"url","type":"SelectorElementAttribute","parentSelectors":["blog"],"selector":"a.titlelnk","multiple":false,"extractAttribute":"href","delay":0},{"id":"desc","type":"SelectorText","parentSelectors":["blog"],"selector":"p","multiple":false,"regex":"","delay":0},{"id":"read","type":"SelectorText","parentSelectors":["blog"],"selector":".article_view a","multiple":false,"regex":"","delay":0},{"id":"pinglun","type":"SelectorText","parentSelectors":["blog"],"selector":".article_comment a","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["blog"],"selector":"div.post_item_foot","multiple":false,"regex":"","delay":0}]}

不写代码的爬虫

标签:view   info   read   案例   src   regex   parent   tor   数据采集   

原文地址:https://www.cnblogs.com/fly-kaka/p/12090232.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!