标签:des style blog color os io for 数据 div
现象,这个网站我总计能抽取将近500个URL,但实际只抽取了100条
解析:nutch默认从一个页面解析出的链接,只取前 100 个。
<property> <name>db.max.outlinks.per.page</name> <value>100</value> <description>The maximum number of outlinks that we‘ll process for a page. If this value is nonnegative (>=0), at most db.max.outlinks.per.page outlinks will be processed for a page; otherwise, all outlinks will be processed. </description> </property>
将这个值改大一些 1000 .
标签:des style blog color os io for 数据 div
原文地址:http://www.cnblogs.com/i80386/p/3957763.html