标签:follow 12月 msi size mod 结合 ges webapps tco
分词的话,我们把“村村通工程 ”名词化,分词结果为:
<!-- 需要分词的字段 --> <field name="content" type="text_ik" indexed="true" stored="true" required="true" multiValued="false" /> <!-- 我添加的IK分词 --> <fieldType name="text_ik" class="solr.TextField"> <analyzer type="index" isMaxWordLength="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/> <analyzer type="query" isMaxWordLength="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/> </fieldType>
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 扩展配置</comment> <!-- 此处可配置扩展字典,例如你可以去下载搜狗的互联网词库 --> <entry key="ext_dict">ext.dic;</entry> <!--停止词字典--> <entry key="ext_stopwords">stopword.dic;</entry> </properties>
<fieldtype name="textComplex" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="E:/solr_home/connect/conf/dic/" /> <filter class="solr.StopFilterFactory" ignoreCase="false" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="20"/> <filter class="solr.StandardFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="E:/solr_home/connect/conf/dic/" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="false" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20"/> --> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype>
配置词库
标签:follow 12月 msi size mod 结合 ges webapps tco
原文地址:http://www.cnblogs.com/logic-regulus/p/6474479.html