标签:tar 不同的 distance gif under text -o 分类 put
3.2 节我们已经运行了一个Lucene检索的小demo(3.2 Lucene实战:一个简单的小程序),能够进行基本的Lucene检索。然后,在实际应用中,用户的需求是多种多样的。比如:
现在,来介绍几种lucene的高级检索方式,来帮助我们满足各种各样的用户需求~~
这部分内容分为两下节,这节介绍五种高级检索方式,下一节介绍另外五种。
一、PhraseQuery
用户在搜索引擎中进行搜索时,常常查找的并非是一个简单的单词,很有可能是几个不同的关键字。这些关键字之间要么是紧密相联,成为一个精确的短语,要么是可能在这几个关键字之间还插有其他无关的关键字。此时,用户希望将它们找出来。不过很显然,从评分的角度看,这些关键字之间拥有与查找内容无关短语所在的文档的分值一般会较低一些。
PhraseQuery正是Lucene所提供的满足上述需求的一种Query对象。它可以让用户往其内部添加关键字,在添加完毕后,用户还可以通过设置slop参数来设定一个称之为“坡度”的变量来确定关键字之间是否允许、允许多少个无关词汇的存在。
1 package testAdvancedQuery; 2 3 import java.io.IOException; 4 import java.nio.file.Paths; 5 6 import org.apache.lucene.document.Document; 7 import org.apache.lucene.index.DirectoryReader; 8 import org.apache.lucene.index.Term; 9 import org.apache.lucene.search.Explanation; 10 import org.apache.lucene.search.IndexSearcher; 11 import org.apache.lucene.search.PhraseQuery; 12 import org.apache.lucene.search.ScoreDoc; 13 import org.apache.lucene.search.TermQuery; 14 import org.apache.lucene.search.TopDocs; 15 import org.apache.lucene.search.spans.SpanQuery; 16 import org.apache.lucene.search.spans.SpanTermQuery; 17 import org.apache.lucene.store.Directory; 18 import org.apache.lucene.store.FSDirectory; 19 import org.apache.lucene.util.Version; 20 21 public class testPhraseQuery { 22 public static Version luceneVersion = Version.LATEST; 23 public static void indexSearch(){ 24 DirectoryReader reader = null; 25 try{ 26 Directory directory = FSDirectory.open(Paths.get("index3")); 27 reader= DirectoryReader.open(directory); 28 IndexSearcher searcher = new IndexSearcher(reader); 29 Term t1=new Term("key2","孙悟空"); 30 Term t2=new Term("key2","猪八戒"); 31 //slop,term...;slop represents the maximum distance between the given terms.reference: 32 //http://lucene.apache.org/core/6_2_1/core/org/apache/lucene/search/PhraseQuery.html 33 PhraseQuery query=new PhraseQuery(5,"key2",t1.bytes(),t2.bytes()); 34 String ss=query.toString(); 35 System.out.println(ss); 36 TopDocs tds = searcher.search(query, 20); 37 ScoreDoc[] sds = tds.scoreDocs; 38 int cou=0; 39 for(ScoreDoc sd:sds){ 40 cou++; 41 Document d = searcher.doc(sd.doc); 42 String output=cou+". "+d.get("category2")+"\n"+d.get("skey1")+"\n"+d.get("skey2"); 43 System.out.println(output); 44 } 45 }catch(Exception e){ 46 e.printStackTrace(); 47 }finally{ 48 try { 49 reader.close(); 50 } catch (IOException e) { 51 e.printStackTrace(); 52 } 53 } 54 } 55 public static void main(String[] args) throws IOException 56 { 57 indexSearch(); 58 } 59 }
PhraseQuery的构造方法有四种:文档。这里介绍演示代码的构造方法:
PhraseQuery(int slop, String field, BytesRef... terms)
slop是int型,通过设置slop“坡度”来确定关键字之间是否允许、允许多少个无关词汇的存在。
filed是String,是要搜索的域。
terms是ByteRef,是用户要搜索的关键字。
因此,第一个需求可以使用PhraseQuery来满足。
二、RangeQuery
RangeQuery是对字符串进行范围查询的,索引中的所有项都以字典顺序排列。它允许用户在某个范围内搜索,该范围的起始项和最终项都可以指定包含或不包含。
1 package testAdvancedQuery; 2 3 import java.io.IOException; 4 import java.nio.file.Paths; 5 6 import org.apache.lucene.document.Document; 7 import org.apache.lucene.index.DirectoryReader; 8 import org.apache.lucene.index.Term; 9 import org.apache.lucene.search.IndexSearcher; 10 import org.apache.lucene.search.Query; 11 import org.apache.lucene.search.ScoreDoc; 12 import org.apache.lucene.search.TermRangeQuery; 13 import org.apache.lucene.search.TopDocs; 14 import org.apache.lucene.store.Directory; 15 import org.apache.lucene.store.FSDirectory; 16 import org.apache.lucene.util.Version; 17 18 public class testRangeQuery { 19 public static Version luceneVersion = Version.LATEST; 20 public static void indexSearch(){ 21 DirectoryReader reader = null; 22 try{ 23 Directory directory = FSDirectory.open(Paths.get("indexrangequery")); 24 reader= DirectoryReader.open(directory); 25 IndexSearcher searcher = new IndexSearcher(reader); 26 //*************测试一******************* 27 // Term begin = new Term("birthday","19980101"); 28 // Term end = new Term("birthday","20040606"); 29 // Query query = new TermRangeQuery("birthday",begin.bytes(),end.bytes(),false,false); 30 //*************测试二******************* 31 Term begin = new Term("lex","ab"); 32 Term end = new Term("lex","cd"); 33 Query query = new TermRangeQuery("lex",begin.bytes(),end.bytes(),false,false); 34 String ss=query.toString(); 35 System.out.println(ss); 36 TopDocs tds = searcher.search(query, 20); 37 ScoreDoc[] sds = tds.scoreDocs; 38 System.out.println(sds.length); 39 int cou=0; 40 for(ScoreDoc sd:sds) 41 { 42 cou++; 43 Document d = searcher.doc(sd.doc); 44 String output=cou+". "+d.get("sname")+" "+d.get("sbirthday")+" "+d.get("sid")+" "+d.get("slex"); 45 System.out.println(output); 46 } 47 }catch(Exception e){ 48 e.printStackTrace(); 49 }finally{ 50 //9、关闭reader 51 try { 52 reader.close(); 53 } catch (IOException e) { 54 e.printStackTrace(); 55 } 56 } 57 } 58 public static void main(String[] args) throws IOException 59 { 60 indexSearch(); 61 } 62 }
构造方法如下:
TermRangeQuery(String field, BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper)
field指明搜索的域;lowerTerm个upperTerm分别的起始项和最终项,最后两个boolean指定是开区间还是闭区间。
这样,对于用户第二个需求,就轻松解决了~~
三、 FuzzyQuery
FuzzyQuery是模糊匹配,基于编辑距离(Edit Distance)的Damerau-Levenshtein算法,编辑距离就是两个字符串有一个转变成另一个所需要的最小的操作步骤。
1 package testAdvancedQuery; 2 3 import java.io.IOException; 4 import java.nio.file.Paths; 5 6 import org.apache.lucene.document.Document; 7 import org.apache.lucene.index.DirectoryReader; 8 import org.apache.lucene.index.Term; 9 import org.apache.lucene.search.FuzzyQuery; 10 import org.apache.lucene.search.IndexSearcher; 11 import org.apache.lucene.search.ScoreDoc; 12 import org.apache.lucene.search.TopDocs; 13 import org.apache.lucene.store.Directory; 14 import org.apache.lucene.store.FSDirectory; 15 import org.apache.lucene.util.Version; 16 17 public class testFuzzyQuery { 18 public static Version luceneVersion = Version.LATEST; 19 public static void indexSearch(String keywords){ 20 DirectoryReader reader = null; 21 try{ 22 Directory directory = FSDirectory.open(Paths.get("index3")); 23 reader= DirectoryReader.open(directory); 24 IndexSearcher searcher = new IndexSearcher(reader); 25 FuzzyQuery query=new FuzzyQuery(new Term("key1",keywords)); 26 String ss=query.toString(); 27 System.out.println(ss); 28 TopDocs tds = searcher.search(query, 20); 29 ScoreDoc[] sds = tds.scoreDocs; 30 int cou=0; 31 for(ScoreDoc sd:sds){ 32 cou++; 33 Document d = searcher.doc(sd.doc); 34 String output=cou+". "+d.get("category2")+"\n"+d.get("skey1"); 35 System.out.println(output); 36 } 37 }catch(Exception e){ 38 e.printStackTrace(); 39 }finally{ 40 try { 41 reader.close(); 42 } catch (IOException e) { 43 e.printStackTrace(); 44 } 45 } 46 } 47 public static void main(String[] args) throws IOException 48 { 49 String keywords[]={"流星","眼睛","小学生"}; 50 for(int i=0;i<keywords.length;i++) 51 { 52 indexSearch(keywords[i]); 53 } 54 55 } 56 }
该函数有四个构造方法,参加FuzzyQuery文档。关于FuzzyQuery的构造方法,这篇博客讲得很好:
四、WildCardQuery
WildCardQuery是通配符查询,通配符“?”代表1个字符,而“*”则代表0至多个字符。使用方法很简单:
1 package testAdvancedQuery; 2 3 import java.io.IOException; 4 import java.nio.file.Paths; 5 import org.apache.lucene.document.Document; 6 import org.apache.lucene.index.DirectoryReader; 7 import org.apache.lucene.index.Term; 8 import org.apache.lucene.search.IndexSearcher; 9 import org.apache.lucene.search.ScoreDoc; 10 import org.apache.lucene.search.TopDocs; 11 import org.apache.lucene.search.WildcardQuery; 12 import org.apache.lucene.store.Directory; 13 import org.apache.lucene.store.FSDirectory; 14 import org.apache.lucene.util.Version; 15 16 public class testWildCardQuery { 17 public static Version luceneVersion = Version.LATEST; 18 public static void indexSearch(String keywords){ 19 DirectoryReader reader = null; 20 try{ 21 Directory directory = FSDirectory.open(Paths.get("index3")); 22 reader= DirectoryReader.open(directory); 23 IndexSearcher searcher = new IndexSearcher(reader); 24 WildcardQuery query=new WildcardQuery(new Term("key1",keywords)); 25 String ss=query.toString(); 26 System.out.println(ss); 27 TopDocs tds = searcher.search(query, 20); 28 ScoreDoc[] sds = tds.scoreDocs; 29 int cou=0; 30 for(ScoreDoc sd:sds){ 31 cou++; 32 Document d = searcher.doc(sd.doc); 33 String output=cou+". "+d.get("category2")+"\n"+d.get("skey1"); 34 System.out.println(output); 35 } 36 }catch(Exception e){ 37 e.printStackTrace(); 38 }finally{ 39 try { 40 reader.close(); 41 } catch (IOException e) { 42 e.printStackTrace(); 43 } 44 } 45 } 46 public static void main(String[] args) throws IOException 47 { 48 String keywords[]={"流?雨","星*","小学*"}; 49 for(int i=0;i<keywords.length;i++) 50 { 51 indexSearch(keywords[i]); 52 } 53 } 54 }
具体用法参考文档:WildCardQuery文档。
WildCardQuery和FuzzyQuery由于需要对字段关键字进行字符串匹配,所以,在搜索的性能上面会受到一些影响。
五、PrefixQuery
PrefixQuery用于匹配其索引开始以指定的字符串的文档。用法很简单:
1 package testAdvancedQuery; 2 3 import java.io.IOException; 4 import java.nio.file.Paths; 5 6 import org.apache.lucene.document.Document; 7 import org.apache.lucene.index.DirectoryReader; 8 import org.apache.lucene.index.Term; 9 import org.apache.lucene.search.IndexSearcher; 10 import org.apache.lucene.search.PrefixQuery; 11 import org.apache.lucene.search.ScoreDoc; 12 import org.apache.lucene.search.TopDocs; 13 import org.apache.lucene.store.Directory; 14 import org.apache.lucene.store.FSDirectory; 15 import org.apache.lucene.util.Version; 16 17 public class testPrefixQuery { 18 public static Version luceneVersion = Version.LATEST; 19 public static void indexSearch(){ 20 DirectoryReader reader = null; 21 try{ 22 Directory directory = FSDirectory.open(Paths.get("index3")); 23 reader= DirectoryReader.open(directory); 24 IndexSearcher searcher = new IndexSearcher(reader); 25 PrefixQuery query=new PrefixQuery(new Term("key1","中")); 26 String ss=query.toString(); 27 System.out.println(ss); 28 TopDocs tds = searcher.search(query, 20); 29 ScoreDoc[] sds = tds.scoreDocs; 30 System.out.println(sds.length); 31 int cou=0; 32 for(ScoreDoc sd:sds){ 33 cou++; 34 Document d = searcher.doc(sd.doc); 35 String output=cou+". "+d.get("category2")+"\n"+d.get("skey1")+"\n"+d.get("skey2"); 36 System.out.println(output); 37 } 38 }catch(Exception e){ 39 e.printStackTrace(); 40 }finally{ 41 try { 42 reader.close(); 43 } catch (IOException e) { 44 e.printStackTrace(); 45 } 46 } 47 } 48 public static void main(String[] args) throws IOException 49 { 50 indexSearch(); 51 } 52 }
详细说明参考官方文档:PrefixQuery文档。
上面的FuzzyQuery,WildCardQuery和PrefixQuery都是不精确查询,可以解决用户的第三个需求~~
下一节,我们介绍另外五种高级检索方式~
标签:tar 不同的 distance gif under text -o 分类 put
原文地址:http://www.cnblogs.com/itcsl/p/6843228.html