solr（四）---将MYSQL数据库做成索引数据源

时间：2014-11-11 02:12:26 阅读：270 评论：0 收藏：0 [点我收藏+]

前面几篇关于solr的文章在导入数据进行分词、索引，都是通过导入本地的XML或者直接在页面上填写XML。但是现实中，很多情况下数据源是来自于数据库的。所以，本文就以mysql为例进行一个较详细的介绍。其使用到的是“dataimport”。

1、在conf\solrconfig.xml中添加，增加导入数据功能

 <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">   
  <lst name="defaults">   
   <str name="config">data-config.xml</str>   
  </lst>   
  </requestHandler>

2、在conf\目录下添加一个数据源data-config.xml,代码如下：

<dataConfig>

    <dataSource type="JdbcDataSource"

   driver="com.mysql.jdbc.Driver"

   url="jdbc:mysql://172.0.0.1:3306/cmntadmin"

   user="root"

   password=""/>

    <document name="content">

        <entity name="node" query="select id,username,creator from forbiduser">

            <field column="id" name="id" />

            <field column="username" name="name" />

            <field column="creator" name="contents" />

        </entity>

    </document>

</dataConfig>

这里配置了数据源的信息。entity的内容来自于“query”查询得到的结果。field对应查询出的字段信息：“column”对应数据库字段名、“name”必须对应“schema.xml”中配置的field值。

3、创建schema.xml语法

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fields>
    <!-- If you remove this field, you must _also_ disable the update log in solrconfig.xml
      or Solr won‘t start. _version_ and update log are required for SolrCloud
   --> 
   <field name="_version_" type="long" indexed="true" stored="true"/>
   
   <!-- points to the root document of a block of nested documents. Required for nested
      document support, may be removed otherwise
   -->
   <field name="_root_" type="string" indexed="true" stored="false"/>
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
    <field name="name" type="text_general" indexed="true" stored="true"/>
    <field name="contents" type="text_ik" indexed="true" stored="true"/>
 </fields>
 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>id</uniqueKey>
 <!-- DEPRECATED: The defaultSearchField is consulted by various query parsers when
  parsing a query string that isn‘t explicit about the field.  Machine (non-user)
  generated queries are best made explicit, or they can use the "df" request parameter
  which takes precedence over this.
  Note: Un-commenting defaultSearchField will be insufficient if your request handler
  in solrconfig.xml defines "df", which takes precedence. That would need to be removed.-->
 <defaultSearchField>contents</defaultSearchField>
<copyField source="name" dest="contents"/>
<solrQueryParser defaultOperator="OR"/>
<types>
 <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
<fieldType name="text_ik" class="solr.TextField"> 
         <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/> 
 </fieldType>

 </types>
</schema>

    schema.xml 里重要的字段:
    要有这个copyField字段SOLR才能检索多个字段的值(以下设置将同时搜索 id,name,contents中的值)<defaultSearchField>contents</defaultSearchField>
    copyField是用来复制你一个栏位里的值到另一栏位用. 如你可以将name里的东西copy到default里, 这样solr做检索时也会检索到name里的東西.
<copyField source="name" dest="contents"/>

4、导入相关jar包

因为本文使用mysql作为数据源，所以需要驱动包（mysql-connector.jar）；另外，使用dataimport功能还需要solr-dataimporthandler-4.7.2.jar和solr-dataimporthandler-extras-4.7.2.jar，这两个jar包不需要下载，在\dist目录下就有。

copy这三个jar包到tomcat下的solr工程下的lib目录下（webapps\solr\WEB-INF\lib）。

5、创建索引

重启tomcat。

A）、可以通过url的方式触发创建全量索引：

http://localhost:8080/solr/dataimport?command=full-import

B）、通过admin页面上的“dataimport”模块进行操作：

本文出自 “会飞的蜗牛” 博客，请务必保留此出处http://flyingsnail.blog.51cto.com/5341669/1575075

solr（四）---将MYSQL数据库做成索引数据源

标签：mysql 数据源 solr

原文地址：http://flyingsnail.blog.51cto.com/5341669/1575075

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行