nutch，solr集成在hadoop上

时间：2014-06-18 10:30:02 阅读：223 评论：0 收藏：0 [点我收藏+]

nutch是一个应用程序，在我的这个项目里主要是做爬虫用，爬取后的内容寄存在hdfs上，所以在hdfs结合模块现已结合上去了。

solr：

在eclipse新建动态页面项目，删去WebContent的一切内容。

　在solr/dist下（或许/solr3.6.2/example/webapps下）解压solr.war 将一切内容拷贝到WenContent里。

修正WEB-INF里的web.xml

增加

solr/home/home/hadoop/solr3.6.2/example/solrtype>java.lang.Stringtype>

到最后的前。

解说下这个当地是你的solr core的方位

采用solr多核的话能够将

/home/hadoop/solr3.6.2/example/multicore，一起修正multicore中的solr.xml

instanceDir为core的寄存方位

在server中新建tomcat7服务，然后增加你刚新建的动态页面工程:

创建indexwrite，开始抓取资源:

indexwrite.sprite("http://www.metabase.cn/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.jinanwuliangye.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.tongxinglong.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.qclchina.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.vipfuxin.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.minnan888.net/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.lcsyt.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://lf.yunnanw.cn/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.yzbljp.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.hyyfscl.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.shoudashou.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.shuoma.com.cn/","utf-8");//资源地址，utf-8

InputStrame.close;

原文地址：http://www.cnblogs.com/haomad/p/3793222.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

周排行