elasticsearch 拼音+ik分词，spring data elasticsearch 拼音分词

时间：2018-10-04 17:24:58 阅读：399 评论：0 收藏：0 [点我收藏+]

标签：prope 中华 tps tokenizer spring nes col temp div

elasticsearch 自定义分词器

安装拼音分词器、ik分词器

　　拼音分词器： https://github.com/medcl/elasticsearch-analysis-pinyin/releases

　　ik分词器：https://github.com/medcl/elasticsearch-analysis-ik/releases

　　下载源码需要使用maven打包

　　下载构建好的压缩包解压后放直接在elasticsearch安装目录下 plugins文件夹下，可以重命名

1.在es中设置分词

创建索引，添加setting属性

PUT myindex
{
  "settings": {
    "index":{
      "analysis":{
        "analyzer":{
          "ik_pinyin_analyzer":{
            "type":"custom",
            "tokenizer":"ik_smart",
            "filter":"pinyin_filter"
          }
        },
        "filter":{
          "pinyin_filter":{
            "type":"pinyin",
            "keep_separate_first_letter" : false,                
　　　　　　　"keep_full_pinyin" : true,                           
　　　　　　　"keep_original" : false,    
            "limit_first_letter_length" : 10,
            "lowercase" : true,
            "remove_duplicated_term" : true
          }
        }
      }
    }
  }
}

添加属性设置mapping属性

PUT myindex/_mapping/users
{
  "properties": {
    "uname":{
      "type": "text",
      "analyzer": "ik_smart",
      "search_analyzer": "ik_smart",
      "fields": {
        "my_pinyin":{
          "type": "text"
          , "analyzer": "ik_pinyin_analyzer",
          "search_analyzer": "ik_pinyin_analyzer"
        }
      }
    },
    "age":{
      "type": "integer"
    }
  }
}

2.spring data elasticsearch设置分词

创建实体类

@Mapping(mappingPath = "elasticsearch_mapping.json")//设置mapping
@Setting(settingPath = "elasticsearch_setting.json")//设置setting
@Document(indexName = "myindex",type = "users")
public class User {
    @Id
    private Integer id;
//
//    @Field(type =FieldType.keyword ,analyzer = "pinyin_analyzer",searchAnalyzer = "pinyin_analyzer")//没有作用
    private String name1;
    @Field(type = FieldType.keyword)
    private String userName;
    @Field(type = FieldType.Nested)
    private List<Product> products;

}

在resources下创建elasticsearch_mapping.json 文件

{
  "properties": {
    "uname": {
      "type": "text",
      "analyzer": "ik_smart",
      "search_analyzer": "ik_smart",
      "fields": {
        "my_pinyin": {
          "type": "text",
          "analyzer": "ik_pinyin_analyzer",
          "search_analyzer": "ik_pinyin_analyzer"
        }
      }
    },
    "age": {
      "type": "integer"
    }
  }
}

在resources下创建elasticsearch_setting.json 文件

{
  "index": {
    "analysis": {
      "analyzer": {
        "ik_pinyin_analyzer": {
          "type": "custom",
          "tokenizer": "ik_smart",
          "filter": "pinyin_filter"
        }
      },
      "filter": {
        "pinyin_filter": {
          "type": "pinyin",
          //true：支持首字母
          "keep_first_letter":true,
          //false：不支持首字母分隔
          "keep_separate_first_letter": false,
          //true：支持全拼
          "keep_full_pinyin": true,
          "keep_original": false,
          //设置最大长度
          "limit_first_letter_length": 10,
          //小写非中文字母
          "lowercase": true,
          //重复的项将被删除
          "remove_duplicated_term": true
        }
      }
    }
  }
}

ik_max_word：会将文本做最细粒度的拆分，例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」，会穷尽各种可能的组合；
ik_smart：会将文本做最粗粒度的拆分，例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」；

程序启动后分词并没有设置分词

实体创建后需要加上，创建的索引才可以分词

elasticsearchTemplate.putMapping(User.class);

elasticsearch 拼音+ik分词，spring data elasticsearch 拼音分词

标签：prope 中华 tps tokenizer spring nes col temp div

原文地址：https://www.cnblogs.com/double-yuan/p/9742567.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行