使用Lucene实现多个文档关键词检索demo（一）

时间：2014-12-03 19:20:38 阅读：214 评论：0 收藏：0 [点我收藏+]

在进行demo前先到http://www.ibm.com/developerworks/cn/java/j-lo-lucene1/了解关于lucene的一些基本概念，忽略其中的代码实例，因为年代久远，而我的这篇文档正是补充其中代码部分。
了解了基本概念后，接下来就可以开始完成demo了。
首先在http://www.apache.org/dyn/closer.cgi/lucene/java/4.10.0下载lucene包，这里我使用的是最新的4.10版，由于最新版与网上其他lucene使用demo在API上有很多差异，因此能够找到实例比较少，只能通过查看官方API来学习如何使用。
lucene4.10 API地址：http://lucene.apache.org/core/4_10_0/core/index.html
将下载后的压缩包解压后，将core和analysis目录下的jar包丢进项目path即可。

接下来就是编码
首先建立问指定目录下的所有文档建立索引，代码如下（具体解释见代码注释）：

	public static void buildIndex(String idir,String dDir)throws IOException{
		File indexDir = new File(idir);// 索引存放目录
		File dataDir = new File(dDir);// 需要建立索引的文件目录
		Analyzer luceneAnalyzer = new StandardAnalyzer();//分词工具
		File[] dataFiles = dataDir.listFiles();
		IndexWriterConfig indexConfig = new IndexWriterConfig(Version.LATEST, luceneAnalyzer);
		FSDirectory fsDirectory = null;
		IndexWriter indexWriter = null;
		try {
			fsDirectory = FSDirectory.open(indexDir);// 索引目录
			indexWriter = new IndexWriter(fsDirectory, indexConfig);// 用于创建索引的对象
			long startTime = new Date().getTime();
			for (int i = 0; i < dataFiles.length; i++) {
				if (dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")) {
					Document document = new Document();//代表一个文档
					Reader txtReader = new FileReader(dataFiles[i]);
					FieldType fieldType = new FieldType();
					fieldType.setIndexed(true);
					document.add(new TextField("path",dataFiles[i].getCanonicalPath(),Store.YES));//Field是用来描述文档的属性，比如这里文档设置了两个属性，路径和内容
					document.add(new Field("contents", txtReader, fieldType));
					indexWriter.addDocument(document);
				}
			}
			indexWriter.commit();//为文档建立索引
			long endTime = new Date().getTime();
			System.out.println("It takes " + (endTime - startTime) + " milliseconds to create index for the files in directory " + dataDir.getPath());
		} catch (IOException e) {
			e.printStackTrace();
			try {
				indexWriter.rollback();
			} catch (IOException e1) {
				e1.printStackTrace();
			}
		} finally {
			if(indexWriter!=null){
				indexWriter.close();
			}
		}
	}

接下来就是使用建立好的索引来检索关键字，代码如下：

public static void search(String queryStr) throws IOException {
		File indexDir = new File("/home/luzhen/lucenceIndex");
		FSDirectory fsDirectory = FSDirectory.open(indexDir);// 存放索引的目录
		IndexReader indexReader = DirectoryReader.open(fsDirectory);
		IndexSearcher searcher = new IndexSearcher(indexReader);
		if (!indexDir.exists()) {
			System.out.println("The Lucene index is not exist");
			return;
		}
		Term term = new Term("contents", queryStr);//搜索的基本单位
		TermQuery luceneQuery = new TermQuery(term);//基本的一个查询类
		TopDocs doc = searcher.search(luceneQuery, 10);//使用query进行查询，10为限制返回的结果
		for (int i = 0; i < doc.totalHits; i++) {//当有检索到结果时进入循环，打印出文档路径
			Document docHit = searcher.doc(doc.scoreDocs[i].doc);
			System.out.println(docHit.get("path"));
		}
	}

使用Lucene实现多个文档关键词检索demo（一）

标签：lucene 索引 search java 搜索

原文地址：http://blog.csdn.net/yukjin/article/details/41700167

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行