标签:
先把文件的代码贴上来:
<?xml version="1.0" encoding="UTF-8" ?> <!-- 版权说明。。. --> <!--
这是solr的chema 文件,这个文件应该被重命名为"schema.xml",而且他应该放在solrhome/core/conf文件下面。 获取你也能在solr webapp 的classload下面找到他. 更多的信息可以查看 http://wiki.apache.org/solr/SchemaXml 性能说明:可以如下来提高性能。 - 设置 stored="false" 对那些只需要搜索,无需返回的字段. - 设置 indexed="false" 对于那些只用于返回无需进行搜索的字段. - 删除所有不需要 copyfiled字段的声明
- 为了最好的索引大小与索引性能,设置所有一般的文本字段index=false,使用copyfile将他们copy到一个字段上,然后使用它进行搜索。
- 运行jvm服务器模式,并使用较高的日志级别,避免记录每一个请求。
--> <schema name="example-data-driven-schema" version="1.6"> <!-- 该配置名称与版本说明. --> <!-- 字段的有效属性:
--> <!--字段名称应该包含字母数字或下划线字符,不以一个数字开始。这是目前没有严格执行,但其他字段名称将不会有来自所有组件的第一类支持和背部的兼容性没有保证。领导和的名字下划线(如_version_)保留。-->
<!--
在这data_driven_schema_configs configset,下面三个字段是必须的:
id、_version_,和_text_。所有其他字段都是可以删除修改的,并根据需要手动添加
在xml。
请注意,许多动态字段也被定义-您可以使用它们来指定一个
字段的类型通过字段命名约定-见下文。
警告:本_text_catch所字段将会显著地提高索引的大小。
如果你不需要,考虑删除它和相应的copyfield指令。-->
<field name="id" type="long" indexed="true" stored="true" required="true"/>
<!-- 常规字段-> <field name="informer_id" type="long" indexed="true" stored="false"/> <field name="phone_number" type="string" indexed="true" stored="false"/> <field name="title" type="string" indexed="true" stored="true" /> <field name="content" type="string" indexed="true" stored="true" /> <field name="latitude" type="string" indexed="true" stored="true" /> <field name="longitude" type="string" indexed="true" stored="true" /> <field name="attachment" type="string" indexed="true" stored="true" /> <field name="clue_status" type="int" indexed="true" stored="true" /> <field name="del_flag" type="int" indexed="true" stored="true" /> <field name="gmt_create" type="date" indexed="true" stored="true" /> <field name="create_uid" type="long" indexed="true" stored="true" /> <field name="gmt_modified" type="date" indexed="true" stored="true" /> <field name="modified_uid" type="long" indexed="true" stored="true" /> <!--预留字段 --> <!--<field name="id" type="string" indexed="true" stored="true" multiValued="false" />--> <field name="_version_" type="long" indexed="true" stored="false"/> <field name="_root_" type="string" indexed="true" stored="false" docValues="false" /> <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
<!--复制字段--> <!--建议建立一个拷贝字段,将所有的 全文本 字段复制到一个字段中,以便进行统一的检索
要注意的是,如果你只是复制单个域,那么如果你被复制域本身就是多值域,那么目标域也是多值域,这毋庸置疑,那如果你复制的是多个域,只要其中有一个域是多值域,那么目标域就一定是多值域,这点一定要谨记
--> <copyField source="*" dest="_text_"/>
<!--动态字段--> <!-- 动态字段 属性配置上与常规字段没啥区别,最大的区别是name的属性上可以进行通配,比如说name="*_i",那么只要是后面带i的字段都是符合的。这样就不怕一些字段无法匹配无法写入 --> <dynamicField name="*_i" type="int" indexed="true" stored="true"/> <dynamicField name="*_is" type="ints" indexed="true" stored="true"/> <dynamicField name="*_s" type="string" indexed="true" stored="true" /> <dynamicField name="*_ss" type="strings" indexed="true" stored="true"/> <dynamicField name="*_l" type="long" indexed="true" stored="true"/> <dynamicField name="*_ls" type="longs" indexed="true" stored="true"/> <dynamicField name="*_t" type="text_general" indexed="true" stored="true"/> <dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/> <dynamicField name="*_b" type="boolean" indexed="true" stored="true"/> <dynamicField name="*_bs" type="booleans" indexed="true" stored="true"/> <dynamicField name="*_f" type="float" indexed="true" stored="true"/> <dynamicField name="*_fs" type="floats" indexed="true" stored="true"/> <dynamicField name="*_d" type="double" indexed="true" stored="true"/> <dynamicField name="*_ds" type="doubles" indexed="true" stored="true"/> <!-- 字段类型 --> <!--
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" /> <fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true" docValues="true" /> <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/> <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/> <!-- sortMissingLast and sortMissingFirst attributes are optional attributes are currently supported on types that are sorted internally as strings and on numeric types. This includes "string","boolean", and, as of 3.5 (and 4.x), int, float, long, date, double, including the "Trie" variants. - If sortMissingLast="true", then a sort on this field will cause documents without the field to come after documents with the field, regardless of the requested sort order (asc or desc). - If sortMissingFirst="true", then a sort on this field will cause documents without the field to come before documents with the field, regardless of the requested sort order. - If sortMissingLast="false" and sortMissingFirst="false" (the default), then default lucene sorting will be used which places docs without the field first in an ascending sort and last in a descending sort. --> <!-- 默认数值类型,用于范围类的查找, consider the tint/tfloat/tlong/tdouble types. 这些字段支持文档的值,但应该是单值字段. --> <fieldType name="int" class="solr.TrieIntField" docValues="true" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" docValues="true" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" docValues="true" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" docValues="true" precisionStep="0" positionIncrementGap="0"/> <fieldType name="ints" class="solr.TrieIntField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/> <fieldType name="floats" class="solr.TrieFloatField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/> <fieldType name="longs" class="solr.TrieLongField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/> <fieldType name="doubles" class="solr.TrieDoubleField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/> <!-- 各个精度值 --> <fieldType name="tint" class="solr.TrieIntField" docValues="true" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tfloat" class="solr.TrieFloatField" docValues="true" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" docValues="true" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" docValues="true" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tints" class="solr.TrieIntField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/> <fieldType name="tfloats" class="solr.TrieFloatField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/> <fieldType name="tlongs" class="solr.TrieLongField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/> <fieldType name="tdoubles" class="solr.TrieDoubleField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/> <!-- 日期格式 Note: --> <fieldType name="date" class="solr.TrieDateField" docValues="true" precisionStep="0" positionIncrementGap="0"/> <fieldType name="dates" class="solr.TrieDateField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/> <!-- 一种基于树结构的日期字段,日期范围查询与数据分类--> <fieldType name="tdate" class="solr.TrieDateField" docValues="true" precisionStep="6" positionIncrementGap="0"/> <fieldType name="tdates" class="solr.TrieDateField" docValues="true" precisionStep="6" positionIncrementGap="0" multiValued="true"/> --> <fieldType name="binary" class="solr.BinaryField"/> <!-- The "RandomSortField" is not used to store or search any data. You can declare fields of this type it in your schema to generate pseudo-random orderings of your docs for sorting or function purposes. The ordering is generated based on the field name and the version of the index. As long as the index version remains unchanged, and the same field name is reused, the ordering of the docs will be consistent. If you want different psuedo-random orderings of documents, for the same version of the index, use a dynamicField and change the field name in the request. --> <fieldType name="random" class="solr.RandomSortField" indexed="true" /> <!-- solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters. Different analyzers may be specified for indexing and querying. The optional positionIncrementGap puts space between multiple fields of this type on the same document, with the purpose of preventing false phrase matching across fields. For more info on customizing your analyzer chain, please see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters --> <!-- One can also specify an existing Analyzer class that has a default constructor via the class attribute on the analyzer element. Example: <fieldType name="text_greek" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/> </fieldType> --> <!-- A text field that only splits on whitespace for exact matching of words --> <dynamicField name="*_ws" type="text_ws" indexed="true" stored="true"/> <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> <!-- A general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive "stopwords.txt" (empty by default), and down cases. At query time only, it also applies synonyms. --> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <!-- A text field with defaults appropriate for English: it tokenizes with StandardTokenizer, removes English stop words (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and finally applies Porter‘s stemming. The query time analyzer also applies synonyms from synonyms.txt. --> <dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true"/> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <!-- Case insensitive stop word removal. --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory: <filter class="solr.EnglishMinimalStemFilterFactory"/> --> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory: <filter class="solr.EnglishMinimalStemFilterFactory"/> --> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <!-- A text field with defaults appropriate for English, plus aggressive word-splitting and autophrase features enabled. This field is just like text_en, except it adds WordDelimiterFilter to enable splitting and matching of words on case-change, alpha numeric boundaries, and non-alphanumeric chars. This means certain compound word cases will work, for example query "wi fi" will match document "WiFi" or "wi-fi". --> <dynamicField name="*_txt_en_split" type="text_en_splitting" indexed="true" stored="true"/> <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <!-- Case insensitive stop word removal. --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <!-- Less flexible matching, but less false matches. Probably not ideal for product names, but may be good for SKUs. Can insert dashes in the wrong place and still match. --> <dynamicField name="*_txt_en_split_tight" type="text_en_splitting_tight" indexed="true" stored="true"/> <fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes possible with WordDelimiterFilter in conjuncton with stemming. --> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <!-- Just like text_general except it reverses the characters of each token, to enable more efficient leading wildcard queries. --> <dynamicField name="*_txt_rev" type="text_general_rev" indexed="true" stored="true"/> <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <dynamicField name="*_phon_en" type="phonetic_en" indexed="true" stored="true"/> <fieldType name="phonetic_en" stored="false" indexed="true" class="solr.TextField" > <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> </analyzer> </fieldType> <!-- lowercases the entire field value, keeping it as a single token. --> <dynamicField name="*_s_lower" type="lowercase" indexed="true" stored="true"/> <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType> <!-- Example of using PathHierarchyTokenizerFactory at index time, so queries for paths match documents at that path, or in descendent paths --> <dynamicField name="*_descendent_path" type="descendent_path" indexed="true" stored="true"/> <fieldType name="descendent_path" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory" /> </analyzer> </fieldType> <!-- Example of using PathHierarchyTokenizerFactory at query time, so queries for paths match documents at that path, or in ancestor paths --> <dynamicField name="*_ancestor_path" type="ancestor_path" indexed="true" stored="true"/> <fieldType name="ancestor_path" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" /> </analyzer> </fieldType> <!-- since fields of this type are by default not stored or indexed, any data added to them will be ignored outright. --> <fieldType name="ignored" stored="false" indexed="false" docValues="false" multiValued="true" class="solr.StrField" /> <!-- This point type indexes the coordinates as separate fields (subFields) If subFieldType is defined, it references a type, and a dynamic field definition is created matching *___<typename>. Alternately, if subFieldSuffix is defined, that is used to create the subFields. Example: if subFieldType="double", then the coordinates would be indexed in fields myloc_0___double,myloc_1___double. Example: if subFieldSuffix="_d" then the coordinates would be indexed in fields myloc_0_d,myloc_1_d The subFields are an implementation detail of the fieldType, and end users normally should not need to know about them. --> <dynamicField name="*_point" type="point" indexed="true" stored="true"/> <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/> <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. --> <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/> <!-- An alternative geospatial field type new to Solr 4. It supports multiValued and polygon shapes. For more information about this and other Spatial fields new to Solr 4, see: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 --> <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" /> <!-- Money/currency field type. See http://wiki.apache.org/solr/MoneyFieldType Parameters: defaultCurrency: Specifies the default currency if none specified. Defaults to "USD" precisionStep: Specifies the precisionStep for the TrieLong field used for the amount providerClass: Lets you plug in other exchange provider backend: solr.FileExchangeRateProvider is the default and takes one parameter: currencyConfig: name of an xml file holding exchange rates solr.OpenExchangeRatesOrgProvider uses rates from openexchangerates.org: ratesFileLocation: URL or path to rates JSON file (default latest.json on the web) refreshInterval: Number of minutes between each rates fetch (default: 1440, min: 60) --> <fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" /> <!-- some examples for different languages (generally ordered by ISO code) --> <!-- Armenian --> <dynamicField name="*_txt_hy" type="text_hy" indexed="true" stored="true"/> <fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hy.txt" /> <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/> </analyzer> </fieldType> </schema>
我与solr(五)--关于schema.xml中的相关配置的详解
标签:
原文地址:http://www.cnblogs.com/DASOU/p/5907645.html