码迷,mamicode.com
首页 > 其他好文 > 详细

配置nutch

时间:2016-01-05 18:29:04      阅读:151      评论:0      收藏:0      [点我收藏+]

标签:

配置nutch

(nutch文件夹已在/home目录下)

1. 修改系统环境变量

sudo gedit /etc/profile

 //增加

#set nutch
export PATH=/home/nutch/runtime/local/bin:$PATH

 

2. 测试(nutch/runtime/local/bin中./nutch  &  ./crawl)

nutch
//结果如下:
Usage: nutch COMMAND
where COMMAND is one of:
 inject		inject new urls into the database
 hostinject     creates or updates an existing host table from a text file
 generate 	generate new batches to fetch from crawl db
 fetch 		fetch URLs marked during generate
 parse 		parse URLs marked during fetch
 updatedb 	update web table after parsing
 updatehostdb   update host table after parsing
 readdb 	read/dump records from page database
 readhostdb     display entries from the hostDB
 elasticindex   run the elasticsearch indexer
 solrindex 	run the solr indexer on parsed batches
 solrdedup 	remove duplicates from solr
 parsechecker   check the parser for a given url
 indexchecker   check the indexing filters for a given url
 plugin 	load a plugin and run one of its classes main()
 nutchserver    run a (local) Nutch server on a user defined port
 junit         	runs the given JUnit test
 or
 CLASSNAME 	run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

 

crawl
//结果如下:
Missing seedDir : crawl <seedDir> <crawlID> <solrURL> <numberOfRounds>

配置nutch

标签:

原文地址:http://www.cnblogs.com/timdes/p/5103236.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!