码迷,mamicode.com
首页 > 其他好文 > 详细

anthelion编译

时间:2016-01-30 18:15:59      阅读:337      评论:0      收藏:0      [点我收藏+]

标签:

编程工程

$ cd ./anthelion/anthelion/target/classes
$ java -Xmx15G -cp ../Anthelion-1.0.0-jar-with-dependencies.jar com.yahoo.research.robme.anthelion.simulation.CCFakeCrawler ./index ./network ./label ../../config/baseline.properties result.log

Necessary files:

  • index: the mapping between ID and URL
  • network: the graph including the IDs from the index
  • label: list of the IDs which fulfil the target function
  • properties: configuration file (a set of configuration files can be found in the resource folder of the distribution)
  • result: the location where the information about the performance and the crawling process are stored

The files which we used to measure the performance when crawling for HTML pages including Microdata, Microformats and RDFa can be found on the dedicated page of the WebDataCommons project: http://webdatacommons.org/structureddata/anthelion/

Available actions within the simulation process:

  • Run "init" to initialize the crawler (loading the network, labels and create the features).
  • Run "start" to start the crawler and simulate a crawl. Output is written to the result.log
  • Use "stop" to stop the simulation
  • Run "exit" to shut down
  • Use "status" to observe the crawling process.

anthelion编译

标签:

原文地址:http://www.cnblogs.com/timdes/p/5171148.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!