标签:
Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
https://lucene.apache.org/core/
Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™.
http://lucene.apache.org/solr/
Apache Nutch is a highly extensible and scalable open source web crawler software project. Stemming from Apache Lucene
http://nutch.apache.org/
Compass is an open source project built on top of Lucene aiming at simplifying the integration of search into any Java application.
http://www.compass-project.org/
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Tika is a project of the Apache Software Foundation, and was formerly a subproject of Apache Lucene
http://tika.apache.org/
标签:
原文地址:http://www.cnblogs.com/shengyang/p/4682390.html