标签:lucene join查询 一步一步跟我学lucene
了解sql的朋友都知道,我们在查询的时候可以采用join查询,即对有一定关联关系的对象进行联合查询来对多维的数据进行整理。这个联合查询的方式挺方便的,跟我们现实生活中的托人找关系类似,我们想要完成一件事,先找自己的熟人,然后通过熟人在一次找到其他,最终通过这种手段找到想要联系到的人。有点类似于”世间万物皆有联系“的感觉。
lucene的join包提供了索引时join和查询时join的功能;
大意是索引时join提供了查询时join的支持,且IndexWriter.addDocuments()方法调用时被join的documents以单个document块存储索引。索引时join对普通文本内容(如xml文档或数据库表)是方便可用的。特别是对类似于数据库的那种多表关联的情况,我们需要对提供关联关系的列提供join支持;
在索引时join的时候,索引中的documents被分割成parent documents(每个索引块的最后一个document)和child documents (除了parent documents外的所有documents). 由于lucene并不记录doc块的信息,我们需要提供一个Filter来标示parent documents。
在搜索结果的时候,我们利用ToParentBlockJoinQuery来从child query到parent document space来remap/join对应的结果。
如果我们只关注匹配查询条件的parent documents,我们可以用任意的collector来采集匹配到的parent documents;如果我们还想采集匹配parent document查询条件的child documents,我们就需要利用ToParentBlockJoinCollector来进行查询;一旦查询完成,我们可以利用ToParentBlockJoinCollector.getTopGroups()来获取匹配条件的TopGroups.
查询时join是基于索引词,其实现有两步:
查询时join接收一下输入参数:
通常查询时join的实现类似于如下:
String fromField = "from"; // Name of the from field boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index String toField = "to"; // Name of the to field ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join. Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode); TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher // Render topDocs...
这里我们模拟6组数据,示例代码如下
package com.lucene.index.test; import static org.junit.Assert.assertEquals; import java.nio.file.Paths; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.SortedDocValuesField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.join.JoinUtil; import org.apache.lucene.search.join.ScoreMode; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.BytesRef; import org.junit.Test; public class TestJoin { @Test public void testSimple() throws Exception { final String idField = "id"; final String toField = "productId"; Directory dir = FSDirectory.open(Paths.get("index")); Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); config.setOpenMode(OpenMode.CREATE); IndexWriter w = new IndexWriter(dir, config); // 0 Document doc = new Document(); doc.add(new TextField("description", "random text", Field.Store.YES)); doc.add(new TextField("name", "name1", Field.Store.YES)); doc.add(new TextField(idField, "1", Field.Store.YES)); doc.add(new SortedDocValuesField(idField, new BytesRef("1"))); w.addDocument(doc); // 1 Document doc1 = new Document(); doc1.add(new TextField("price", "10.0", Field.Store.YES)); doc1.add(new TextField(idField, "2", Field.Store.YES)); doc1.add(new SortedDocValuesField(idField, new BytesRef("2"))); doc1.add(new TextField(toField, "1", Field.Store.YES)); doc1.add(new SortedDocValuesField(toField, new BytesRef("1"))); w.addDocument(doc1); // 2 Document doc2 = new Document(); doc2.add(new TextField("price", "20.0", Field.Store.YES)); doc2.add(new TextField(idField, "3", Field.Store.YES)); doc2.add(new SortedDocValuesField(idField, new BytesRef("3"))); doc2.add(new TextField(toField, "1", Field.Store.YES)); doc2.add(new SortedDocValuesField(toField, new BytesRef("1"))); w.addDocument(doc2); // 3 Document doc3 = new Document(); doc3.add(new TextField("description", "more random text", Field.Store.YES)); doc3.add(new TextField("name", "name2", Field.Store.YES)); doc3.add(new TextField(idField, "4", Field.Store.YES)); doc3.add(new SortedDocValuesField(idField, new BytesRef("4"))); w.addDocument(doc3); // 4 Document doc4 = new Document(); doc4.add(new TextField("price", "10.0", Field.Store.YES)); doc4.add(new TextField(idField, "5", Field.Store.YES)); doc4.add(new SortedDocValuesField(idField, new BytesRef("5"))); doc4.add(new TextField(toField, "4", Field.Store.YES)); doc4.add(new SortedDocValuesField(toField, new BytesRef("4"))); w.addDocument(doc4); // 5 Document doc5 = new Document(); doc5.add(new TextField("price", "20.0", Field.Store.YES)); doc5.add(new TextField(idField, "6", Field.Store.YES)); doc5.add(new SortedDocValuesField(idField, new BytesRef("6"))); doc5.add(new TextField(toField, "4", Field.Store.YES)); doc5.add(new SortedDocValuesField(toField, new BytesRef("4"))); w.addDocument(doc5); //6 Document doc6 = new Document(); doc6.add(new TextField(toField, "4", Field.Store.YES)); doc6.add(new SortedDocValuesField(toField, new BytesRef("4"))); w.addDocument(doc6); w.commit(); w.close(); IndexReader reader = DirectoryReader.open(dir); IndexSearcher indexSearcher = new IndexSearcher(reader); // Search for product Query joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name2")), indexSearcher, ScoreMode.None); System.out.println(joinQuery); TopDocs result = indexSearcher.search(joinQuery, 10); System.out.println("查询到的匹配数据:"+result.totalHits); joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name1")), indexSearcher, ScoreMode.None); result = indexSearcher.search(joinQuery, 10); System.out.println("查询到的匹配数据:"+result.totalHits); // Search for offer joinQuery = JoinUtil.createJoinQuery(toField, false, idField, new TermQuery(new Term("id", "5")), indexSearcher, ScoreMode.None); result = indexSearcher.search(joinQuery, 10); System.out.println("查询到的匹配数据:"+result.totalHits); indexSearcher.getIndexReader().close(); dir.close(); } }
程序的运行结果如下:
查询到的匹配数据:3 查询到的匹配数据:2 查询到的匹配数据:1
以第一个查询为例:
我们在查询的时候先根据name=name2这个查询条件找到记录为doc3的document,由于查询的是toField匹配的,我们在根据doc3找到其toField的值为4,然后查询条件变为productId:4,找出除本条记录外的其他数据,结果正好为3,符合条件。
一步一步跟我学习lucene是对近期做lucene索引的总结,大家有问题的话联系本人的Q-Q: 891922381,同时本人新建Q-Q群:106570134(lucene,solr,netty,hadoop),如蒙加入,不胜感激,大家共同探讨,本人争取每日一博,希望大家持续关注,会带给大家惊喜的
一步一步跟我学习lucene(18)---lucene索引时join和查询时join使用示例
标签:lucene join查询 一步一步跟我学lucene
原文地址:http://blog.csdn.net/wuyinggui10000/article/details/46336417