码迷,mamicode.com
首页 > Web开发 > 详细

为推文优化的Lucene Analyzer类

时间:2017-07-05 19:51:23      阅读:211      评论:0      收藏:0      [点我收藏+]

标签:extend   length   cat   inf   filter   ret   add   pen   lan   

<strong><span style="font-size:18px;">/***
 * @author YangXin
 * @info 使用Doublemetaphone函数对Twitter优化。
 * Doublemetaphone函数能够为发音类似的单词创建同样的键
 *  
 */
package unitTwelve;

import java.io.IOException;

import org.apache.commons.codec.language.DoubleMetaphone;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.en.PorterStemFilter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.Version;
public class TwitterAnalyzer extends Analyzer{
	private DoubleMetaphone filter = new DoubleMetaphone();
	public TokenStream result = new PorterStemFilter(new StopFilter(true, new StandardTokenizer(Version.LUCENE_CURRENT, reader), StandardAnalyzer.STOP_WORDS_SET));
	TermAttribute termAtt = (TermAttribute) result.addAttribute(TermAttribute.class);
	StringBuilder buf = new StringBuilder();
	try{
		while(result.incrementToken()){
			String word = new String(termAtt.term(), 0, termAtt.termLength());
			buf.append(filter.encode(filter.encode(word)).append(" "));
		}
	}catch(IOException e){
		e.printStackTrace();
	}
	return new WhitespaceTokenizer(new StringReader(buf.toString()));
	}
}</span></strong>

为推文优化的Lucene Analyzer类

标签:extend   length   cat   inf   filter   ret   add   pen   lan   

原文地址:http://www.cnblogs.com/ljbguanli/p/7122997.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!