标签:
前言
搞检索的,应该多少都会了解Lucene一些,它开源而且简单上手,官方API足够编写些小DEMO。并且根据倒排索引,实现快速检索。本文就简单的实现增量添加索引,删除索引,通过关键字查询,以及更新索引等操作。
目前博猪使用的不爽的地方就是,读取文件内容进行全文检索时,需要自己编写读取过程(这个solr免费帮我们实现)。而且创建索引的过程比较慢,还有很大的优化空间,这个就要细心下来研究了。
Lucene在进行创建索引时,根据前面一篇博客,已经讲完了大体的流程,这里再简单说下:
1 Directory directory = FSDirectory.open("/tmp/testindex");
2 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
3 IndexWriter iwriter = new IndexWriter(directory, config);
4 Document doc = new Document();
5 String text = "This is the text to be indexed.";
6 doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.close();
1 创建Directory,获取索引目录
2 创建词法分析器,创建IndexWriter对象
3 创建document对象,存储数据
4 关闭IndexWriter,提交
1 /**
2 * 建立索引
3 *
4 * @param args
5 */
6 public static void index() throws Exception {
7
8 String text1 = "hello,man!";
9 String text2 = "goodbye,man!";
10 String text3 = "hello,woman!";
11 String text4 = "goodbye,woman!";
12
13 Date date1 = new Date();
14 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
15 directory = FSDirectory.open(new File(INDEX_DIR));
16
17 IndexWriterConfig config = new IndexWriterConfig(
18 Version.LUCENE_CURRENT, analyzer);
19 indexWriter = new IndexWriter(directory, config);
20
21 Document doc1 = new Document();
22 doc1.add(new TextField("filename", "text1", Store.YES));
23 doc1.add(new TextField("content", text1, Store.YES));
24 indexWriter.addDocument(doc1);
25
26 Document doc2 = new Document();
27 doc2.add(new TextField("filename", "text2", Store.YES));
28 doc2.add(new TextField("content", text2, Store.YES));
29 indexWriter.addDocument(doc2);
30
31 Document doc3 = new Document();
32 doc3.add(new TextField("filename", "text3", Store.YES));
33 doc3.add(new TextField("content", text3, Store.YES));
34 indexWriter.addDocument(doc3);
35
36 Document doc4 = new Document();
37 doc4.add(new TextField("filename", "text4", Store.YES));
38 doc4.add(new TextField("content", text4, Store.YES));
39 indexWriter.addDocument(doc4);
40
41 indexWriter.commit();
42 indexWriter.close();
43
44 Date date2 = new Date();
45 System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");
46 }
Lucene拥有增量添加索引的功能,在不会影响之前的索引情况下,添加索引,它会在何时的时机,自动合并索引文件。
1 /**
2 * 增加索引
3 *
4 * @throws Exception
5 */
6 public static void insert() throws Exception {
7 String text5 = "hello,goodbye,man,woman";
8 Date date1 = new Date();
9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10 directory = FSDirectory.open(new File(INDEX_DIR));
11
12 IndexWriterConfig config = new IndexWriterConfig(
13 Version.LUCENE_CURRENT, analyzer);
14 indexWriter = new IndexWriter(directory, config);
15
16 Document doc1 = new Document();
17 doc1.add(new TextField("filename", "text5", Store.YES));
18 doc1.add(new TextField("content", text5, Store.YES));
19 indexWriter.addDocument(doc1);
20
21 indexWriter.commit();
22 indexWriter.close();
23
24 Date date2 = new Date();
25 System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");
26 }
Lucene也是通过IndexWriter调用它的delete方法,来删除索引。我们可以通过关键字,删除与这个关键字有关的所有内容。如果仅仅是想要删除一个文档,那么最好就顶一个唯一的ID域,通过这个ID域,来进行删除操作。
1 /**
2 * 删除索引
3 *
4 * @param str 删除的关键字
5 * @throws Exception
6 */
7 public static void delete(String str) throws Exception {
8 Date date1 = new Date();
9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10 directory = FSDirectory.open(new File(INDEX_DIR));
11
12 IndexWriterConfig config = new IndexWriterConfig(
13 Version.LUCENE_CURRENT, analyzer);
14 indexWriter = new IndexWriter(directory, config);
15
16 indexWriter.deleteDocuments(new Term("filename",str));
17
18 indexWriter.close();
19
20 Date date2 = new Date();
21 System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");
22 }
Lucene没有真正的更新操作,通过某个fieldname,可以更新这个域对应的索引,但是实质上,它是先删除索引,再重新建立的。
1 /**
2 * 更新索引
3 *
4 * @throws Exception
5 */
6 public static void update() throws Exception {
7 String text1 = "update,hello,man!";
8 Date date1 = new Date();
9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10 directory = FSDirectory.open(new File(INDEX_DIR));
11
12 IndexWriterConfig config = new IndexWriterConfig(
13 Version.LUCENE_CURRENT, analyzer);
14 indexWriter = new IndexWriter(directory, config);
15
16 Document doc1 = new Document();
17 doc1.add(new TextField("filename", "text1", Store.YES));
18 doc1.add(new TextField("content", text1, Store.YES));
19
20 indexWriter.updateDocument(new Term("filename","text1"), doc1);
21
22 indexWriter.close();
23
24 Date date2 = new Date();
25 System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n");
26 }
Lucene的查询方式有很多种,这里就不做详细介绍了。它会返回一个ScoreDoc的集合,类似ResultSet的集合,我们可以通过域名获取想要获取的内容。
1 /**
2 * 关键字查询
3 *
4 * @param str
5 * @throws Exception
6 */
7 public static void search(String str) throws Exception {
8 directory = FSDirectory.open(new File(INDEX_DIR));
9 analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
10 DirectoryReader ireader = DirectoryReader.open(directory);
11 IndexSearcher isearcher = new IndexSearcher(ireader);
12
13 QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);
14 Query query = parser.parse(str);
15
16 ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
17 for (int i = 0; i < hits.length; i++) {
18 Document hitDoc = isearcher.doc(hits[i].doc);
19 System.out.println(hitDoc.get("filename"));
20 System.out.println(hitDoc.get("content"));
21 }
22 ireader.close();
23 directory.close();
24 }
http://www.cnblogs.com/xing901022/p/3933675.html
标签:
原文地址:http://www.cnblogs.com/1130136248wlxk/p/4998947.html