前面几篇关于solr的文章在导入数据进行分词、索引,都是通过导入本地的XML或者直接在页面上填写XML。但是现实中,很多情况下数据源是来自于数据库的。所以,本文就以mysql为例进行一个较详细的介绍。其使用到的是“dataimport”。
1、在conf\solrconfig.xml中添加,增加导入数据功能
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler>
2、在conf\目录下添加一个数据源data-config.xml,代码如下:
<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://172.0.0.1:3306/cmntadmin" user="root" password=""/> <document name="content"> <entity name="node" query="select id,username,creator from forbiduser"> <field column="id" name="id" /> <field column="username" name="name" /> <field column="creator" name="contents" /> </entity> </document> </dataConfig>
这里配置了数据源的信息。entity的内容来自于“query”查询得到的结果。field对应查询出的字段信息:“column”对应数据库字段名、“name”必须对应“schema.xml”中配置的field值。
3、创建schema.xml语法
<?xml version="1.0" encoding="UTF-8" ?> <schema name="example" version="1.5"> <fields> <!-- If you remove this field, you must _also_ disable the update log in solrconfig.xml or Solr won‘t start. _version_ and update log are required for SolrCloud --> <field name="_version_" type="long" indexed="true" stored="true"/> <!-- points to the root document of a block of nested documents. Required for nested document support, may be removed otherwise --> <field name="_root_" type="string" indexed="true" stored="false"/> <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="name" type="text_general" indexed="true" stored="true"/> <field name="contents" type="text_ik" indexed="true" stored="true"/> </fields> <!-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required="false", it will be a required field --> <uniqueKey>id</uniqueKey> <!-- DEPRECATED: The defaultSearchField is consulted by various query parsers when parsing a query string that isn‘t explicit about the field. Machine (non-user) generated queries are best made explicit, or they can use the "df" request parameter which takes precedence over this. Note: Un-commenting defaultSearchField will be insufficient if your request handler in solrconfig.xml defines "df", which takes precedence. That would need to be removed.--> <defaultSearchField>contents</defaultSearchField> <copyField source="name" dest="contents"/> <solrQueryParser defaultOperator="OR"/> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_ik" class="solr.TextField"> <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/> </fieldType> </types> </schema>
schema.xml 里重要的字段:
要有这个copyField字段SOLR才能检索多个字段的值(以下设置将同时搜索 id,name,contents中的值)<defaultSearchField>contents</defaultSearchField>
copyField是用来复制你一个栏位里的值到另一栏位用. 如你可以将name里的东西copy到default里, 这样solr做检索时也会检索到name里的東西.
<copyField source="name" dest="contents"/>
4、导入相关jar包
因为本文使用mysql作为数据源,所以需要驱动包(mysql-connector.jar);另外,使用dataimport功能还需要solr-dataimporthandler-4.7.2.jar和solr-dataimporthandler-extras-4.7.2.jar,这两个jar包不需要下载,在\dist目录下就有。
copy这三个jar包到tomcat下的solr工程下的lib目录下(webapps\solr\WEB-INF\lib)。
5、创建索引
重启tomcat。
A)、可以通过url的方式触发创建全量索引:
http://localhost:8080/solr/dataimport?command=full-import
B)、通过admin页面上的“dataimport”模块进行操作:
本文出自 “会飞的蜗牛” 博客,请务必保留此出处http://flyingsnail.blog.51cto.com/5341669/1575075
原文地址:http://flyingsnail.blog.51cto.com/5341669/1575075