标签:重命名 size you analyzer 影响 分布式系统 系统配置 调用 转换
节点是一个Elasticsearch实例:
每一个节点都有名字,通过配置文件,或者启动的时候-E node.name=es01指定
每一个节点启动后,都会生产一个UID,保存在data目录下
处理请求的节点叫 Coordinating Node
所有节点默认都是Coordinating Node
通过将其他类型设置成False,使其变成Coordinating Node节点
可以保存数据的节点,就叫Data Node节点
Data Node的职责
通过增加数据节点
Master Node的职责
Master Node最佳实践
集群状态信息,维护一个集群中,必要信息
在每一个节点上都保存了集群的状态信息
但是,只有Master节点上才能修改集群状态的信息,并负责同步给其他节点
相互ping对方,Node ID低的会成为被选举的节点
其他节点会加入集群,但是不承担Master 节点的角色,一旦发现被选中的节点丢失,就会选举出新的Master节点
Split-Brain,分布式系统的经典网络问题,当出现网络问题,一个节点和其他节点无法连接
如何避免脑裂问题
限定一个选举条件,设置quorum(仲裁),只有在Master eligishble 节点数大于quorum时,才能进行选举
从7.0开始,无需此配置
分片是Elasticsearch分布式存储基石
通过主分片将数据分布在所有节点上
如何规划一个索引的主分片和副本分片数
文档会存储在具体的某个主分片和副本分片上,例如:文档1,会存储在P0和R0分片上
文档到分片的映射算法:
文档到分片的路由算法
shard = hash(_routing) % number_of_primary_shards
什么是ES的分片
一些问题:
倒排索引的不可变性
ES Flush & Luence Commit
Elasticsearch的搜索分为两步:
第一步-Query
第二部-Fetch
Query Then Fetch 的潜在问题
性能问题:
相关性算分
ES天生就是分布式系统,查询信息,但是数据分别保存在多个分片中,多台机器上,ES天生就需要满足排序的需求(按照相关性算分)
当一个查询:From=990, Size=10
单值分析
多值分析
Demo
生产数据
#定义员工表索引的定义 PUT /employees/ { "mappings":{ "properties":{ "age":{ "type": "integer" }, "gender":{ "type": "keyword" }, "job":{ "type": "text", "fields":{ "keyword": { "type": "keyword", "ignore_above": 50 } } }, "name":{ "type": "keyword" }, "salary":{ "type" : "integer" } } } } #插入数据 PUT /employees/_bulk { "index" : { "_id" : "1" } } { "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 } { "index" : { "_id" : "2" } } { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000} { "index" : { "_id" : "3" } } { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 } { "index" : { "_id" : "4" } } { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000} { "index" : { "_id" : "5" } } { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 } { "index" : { "_id" : "6" } } { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000} { "index" : { "_id" : "7" } } { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 } { "index" : { "_id" : "8" } } { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000} { "index" : { "_id" : "9" } } { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 } { "index" : { "_id" : "10" } } { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000} { "index" : { "_id" : "11" } } { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 } { "index" : { "_id" : "12" } } { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000} { "index" : { "_id" : "13" } } { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 } { "index" : { "_id" : "14" } } { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000} { "index" : { "_id" : "15" } } { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 } { "index" : { "_id" : "16" } } { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "17" } } { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "18" } } { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000} { "index" : { "_id" : "19" } } { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000} { "index" : { "_id" : "20" } } { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
测试样例
#Metric 聚合 找到最低工资 POST employees/_search { "size":0, "aggs": { "min_salary": { "min": { "field": "salary" } } } } #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "min_salary" : { "value" : 9000.0 } } } #Metric 聚合 找到最高工资 POST employees/_search { "size":0, "aggs": { "max_salary": { "max": { "field": "salary" } } } } #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "max_salary" : { "value" : 50000.0 } } } #多个Metric 聚合 找到 最低最高平均工资 POST employees/_search { "size": 0, "aggs": { "max_salary": { "max": { "field": "salary" } }, "min_salary": { "min": { "field": "salary" } }, "avg_salary": { "avg": { "field": "salary" } } } } #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "max_salary" : { "value" : 50000.0 }, "avg_salary" : { "value" : 24700.0 }, "min_salary" : { "value" : 9000.0 } } } # 一个聚合,输出多值,统计 POST employees/_search { "size": 0, "aggs": { "stats_salary": { "stats": { "field":"salary" } } } } #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "stats_salary" : { "count" : 20, "min" : 9000.0, "max" : 50000.0, "avg" : 24700.0, "sum" : 494000.0 } } }
Bucket聚合分析
按照一定规则,将文档分配到不同的桶中,从而达到分类的目的,ES提供常见Bucket Aggregation
Terms Aggregation
# 对job的keyword 进行聚合 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field":"job.keyword" } } } } #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "jobs" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java Programmer", "doc_count" : 7 }, { "key" : "Javascript Programmer", "doc_count" : 4 }, { "key" : "QA", "doc_count" : 3 }, { "key" : "DBA", "doc_count" : 2 }, { "key" : "Web Designer", "doc_count" : 2 }, { "key" : "Dev Manager", "doc_count" : 1 }, { "key" : "Product Manager", "doc_count" : 1 } ] } } }
对Text类型的进行聚合分析的话,需要打开fieldata功能
# 对 Text 字段打开 fielddata,支持terms aggregation PUT employees/_mapping { "properties" : { "job":{ "type": "text", "fielddata": true } } } # 对 Text 字段进行 terms 分词。分词后的terms POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field":"job" } } } } #查询结果,而keyword不同, { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "jobs" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "programmer", "doc_count" : 11 }, { "key" : "java", "doc_count" : 7 }, { "key" : "javascript", "doc_count" : 4 }, { "key" : "qa", "doc_count" : 3 }, { "key" : "dba", "doc_count" : 2 }, { "key" : "designer", "doc_count" : 2 }, { "key" : "manager", "doc_count" : 2 }, { "key" : "web", "doc_count" : 2 }, { "key" : "dev", "doc_count" : 1 }, { "key" : "product", "doc_count" : 1 } ] } } }
对terms统计的的做法
# 对job.keyword 和 job 进行 terms 聚合,分桶的总数并不一样 POST employees/_search { "size": 0, "aggs": { "cardinate": { "cardinality": { "field": "job.keyword" } } } } #查询结果 { "took" : 7, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "cardinate" : { "value" : 7 } } }
对性别分桶
# 对 性别的 keyword 进行聚合 POST employees/_search { "size": 0, "aggs": { "gender": { "terms": { "field":"gender" } } } } #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "male", "doc_count" : 12 }, { "key" : "female", "doc_count" : 8 } ] } } }
指定size
#指定 bucket 的 size POST employees/_search { "size": 0, "aggs": { "ages_5": { "terms": { "field":"age", "size":3 } } } } #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "ages_5" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 12, "buckets" : [ { "key" : 25, "doc_count" : 3 }, { "key" : 32, "doc_count" : 3 }, { "key" : 27, "doc_count" : 2 } ] } } }
Bucket Size
# 指定size,不同工种中,年纪最大的3个员工的具体信息 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field":"job.keyword" }, "aggs":{ "old_employee":{ "top_hits":{ "size":3, "sort":[ { "age":{ "order":"desc" } } ] } } } } } } #查询结果 { "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "jobs" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java Programmer", "doc_count" : 7, "old_employee" : { "hits" : { "total" : { "value" : 7, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "11", "_score" : null, "_source" : { "name" : "Jenny", "age" : 36, "job" : "Java Programmer", "gender" : "female", "salary" : 38000 }, "sort" : [ 36 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "15", "_score" : null, "_source" : { "name" : "King", "age" : 33, "job" : "Java Programmer", "gender" : "male", "salary" : 28000 }, "sort" : [ 33 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "9", "_score" : null, "_source" : { "name" : "Gregory", "age" : 32, "job" : "Java Programmer", "gender" : "male", "salary" : 22000 }, "sort" : [ 32 ] } ] } } }, { "key" : "Javascript Programmer", "doc_count" : 4, "old_employee" : { "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "14", "_score" : null, "_source" : { "name" : "Marshall", "age" : 32, "job" : "Javascript Programmer", "gender" : "male", "salary" : 25000 }, "sort" : [ 32 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "18", "_score" : null, "_source" : { "name" : "Catherine", "age" : 29, "job" : "Javascript Programmer", "gender" : "female", "salary" : 20000 }, "sort" : [ 29 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "17", "_score" : null, "_source" : { "name" : "Goodwin", "age" : 25, "job" : "Javascript Programmer", "gender" : "male", "salary" : 16000 }, "sort" : [ 25 ] } ] } } }, { "key" : "QA", "doc_count" : 3, "old_employee" : { "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "6", "_score" : null, "_source" : { "name" : "Lucy", "age" : 31, "job" : "QA", "gender" : "female", "salary" : 25000 }, "sort" : [ 31 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "7", "_score" : null, "_source" : { "name" : "Byrd", "age" : 27, "job" : "QA", "gender" : "male", "salary" : 20000 }, "sort" : [ 27 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "5", "_score" : null, "_source" : { "name" : "Rose", "age" : 25, "job" : "QA", "gender" : "female", "salary" : 18000 }, "sort" : [ 25 ] } ] } } }, { "key" : "DBA", "doc_count" : 2, "old_employee" : { "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "19", "_score" : null, "_source" : { "name" : "Boone", "age" : 30, "job" : "DBA", "gender" : "male", "salary" : 30000 }, "sort" : [ 30 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "20", "_score" : null, "_source" : { "name" : "Kathy", "age" : 29, "job" : "DBA", "gender" : "female", "salary" : 20000 }, "sort" : [ 29 ] } ] } } }, { "key" : "Web Designer", "doc_count" : 2, "old_employee" : { "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "4", "_score" : null, "_source" : { "name" : "Rivera", "age" : 26, "job" : "Web Designer", "gender" : "female", "salary" : 22000 }, "sort" : [ 26 ] }, { "_index" : "employees", "_type" : "_doc", "_id" : "3", "_score" : null, "_source" : { "name" : "Tran", "age" : 25, "job" : "Web Designer", "gender" : "male", "salary" : 18000 }, "sort" : [ 25 ] } ] } } }, { "key" : "Dev Manager", "doc_count" : 1, "old_employee" : { "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "2", "_score" : null, "_source" : { "name" : "Underwood", "age" : 41, "job" : "Dev Manager", "gender" : "male", "salary" : 50000 }, "sort" : [ 41 ] } ] } } }, { "key" : "Product Manager", "doc_count" : 1, "old_employee" : { "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "employees", "_type" : "_doc", "_id" : "1", "_score" : null, "_source" : { "name" : "Emma", "age" : 32, "job" : "Product Manager", "gender" : "female", "salary" : 35000 }, "sort" : [ 32 ] } ] } } } ] } } }
#Ranges 分桶
#Salary Ranges 分桶,可以自己定义 key POST employees/_search { "size": 0, "aggs": { "salary_range": { "range": { "field":"salary", "ranges":[ { "to":10000 }, { "from":10000, "to":20000 }, { "key":">20000", "from":20000 } ] } } } } #查询结果 { "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "salary_range" : { "buckets" : [ { "key" : "*-10000.0", "to" : 10000.0, "doc_count" : 1 }, { "key" : "10000.0-20000.0", "from" : 10000.0, "to" : 20000.0, "doc_count" : 4 }, { "key" : ">20000", "from" : 20000.0, "doc_count" : 15 } ] } } }
#Salary Histogram,工资0到10万,以 5000一个区间进行分桶 POST employees/_search { "size": 0, "aggs": { "salary_histrogram": { "histogram": { "field":"salary", "interval":5000, "extended_bounds":{ "min":0, "max":100000 } } } } }
Bucket 子聚合分析、子聚合可以是Bucket 或者 Metric
# 嵌套聚合1,按照工作类型分桶,并统计工资信息 POST employees/_search { "size": 0, "aggs": { "Job_salary_stats": { "terms": { "field": "job.keyword" }, "aggs": { "salary": { "stats": { "field": "salary" } } } } } } #查询结果 { "took" : 9, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "Job_salary_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java Programmer", "doc_count" : 7, "salary" : { "count" : 7, "min" : 9000.0, "max" : 38000.0, "avg" : 25571.428571428572, "sum" : 179000.0 } }, { "key" : "Javascript Programmer", "doc_count" : 4, "salary" : { "count" : 4, "min" : 16000.0, "max" : 25000.0, "avg" : 19250.0, "sum" : 77000.0 } }, { "key" : "QA", "doc_count" : 3, "salary" : { "count" : 3, "min" : 18000.0, "max" : 25000.0, "avg" : 21000.0, "sum" : 63000.0 } }, { "key" : "DBA", "doc_count" : 2, "salary" : { "count" : 2, "min" : 20000.0, "max" : 30000.0, "avg" : 25000.0, "sum" : 50000.0 } }, { "key" : "Web Designer", "doc_count" : 2, "salary" : { "count" : 2, "min" : 18000.0, "max" : 22000.0, "avg" : 20000.0, "sum" : 40000.0 } }, { "key" : "Dev Manager", "doc_count" : 1, "salary" : { "count" : 1, "min" : 50000.0, "max" : 50000.0, "avg" : 50000.0, "sum" : 50000.0 } }, { "key" : "Product Manager", "doc_count" : 1, "salary" : { "count" : 1, "min" : 35000.0, "max" : 35000.0, "avg" : 35000.0, "sum" : 35000.0 } } ] } } }
# 多次嵌套。根据工作类型分桶,然后按照性别分桶,计算工资的统计信息 POST employees/_search { "size": 0, "aggs": { "Job_gender_stats": { "terms": { "field": "job.keyword" }, "aggs": { "gender_stats": { "terms": { "field": "gender" }, "aggs": { "salary_stats": { "stats": { "field": "salary" } } } } } } } } #查询结果 { "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "Job_gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java Programmer", "doc_count" : 7, "gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "male", "doc_count" : 5, "salary_stats" : { "count" : 5, "min" : 9000.0, "max" : 32000.0, "avg" : 22200.0, "sum" : 111000.0 } }, { "key" : "female", "doc_count" : 2, "salary_stats" : { "count" : 2, "min" : 30000.0, "max" : 38000.0, "avg" : 34000.0, "sum" : 68000.0 } } ] } }, { "key" : "Javascript Programmer", "doc_count" : 4, "gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "male", "doc_count" : 3, "salary_stats" : { "count" : 3, "min" : 16000.0, "max" : 25000.0, "avg" : 19000.0, "sum" : 57000.0 } }, { "key" : "female", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 20000.0, "max" : 20000.0, "avg" : 20000.0, "sum" : 20000.0 } } ] } }, { "key" : "QA", "doc_count" : 3, "gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "female", "doc_count" : 2, "salary_stats" : { "count" : 2, "min" : 18000.0, "max" : 25000.0, "avg" : 21500.0, "sum" : 43000.0 } }, { "key" : "male", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 20000.0, "max" : 20000.0, "avg" : 20000.0, "sum" : 20000.0 } } ] } }, { "key" : "DBA", "doc_count" : 2, "gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "female", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 20000.0, "max" : 20000.0, "avg" : 20000.0, "sum" : 20000.0 } }, { "key" : "male", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 30000.0, "max" : 30000.0, "avg" : 30000.0, "sum" : 30000.0 } } ] } }, { "key" : "Web Designer", "doc_count" : 2, "gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "female", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 22000.0, "max" : 22000.0, "avg" : 22000.0, "sum" : 22000.0 } }, { "key" : "male", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 18000.0, "max" : 18000.0, "avg" : 18000.0, "sum" : 18000.0 } } ] } }, { "key" : "Dev Manager", "doc_count" : 1, "gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "male", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 50000.0, "max" : 50000.0, "avg" : 50000.0, "sum" : 50000.0 } } ] } }, { "key" : "Product Manager", "doc_count" : 1, "gender_stats" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "female", "doc_count" : 1, "salary_stats" : { "count" : 1, "min" : 35000.0, "max" : 35000.0, "avg" : 35000.0, "sum" : 35000.0 } } ] } } ] } } }
管道的概念:支持聚合分析的结果,再次聚合分析
Pipeline的分析结果输出到原结果当中,根据位置的不同,分为两类:
# 平均工资最低的工作类型 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "min_salary_by_job":{ "min_bucket": { "buckets_path": "jobs>avg_salary" } } } } # 平均工资最高的工作类型 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "max_salary_by_job":{ "max_bucket": { "buckets_path": "jobs>avg_salary" } } } } # 平均工资的平均工资 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "avg_salary_by_job":{ "avg_bucket": { "buckets_path": "jobs>avg_salary" } } } } # 平均工资的统计分析 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "stats_salary_by_job":{ "stats_bucket": { "buckets_path": "jobs>avg_salary" } } } } # 平均工资的百分位数 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "percentiles_salary_by_job":{ "percentiles_bucket": { "buckets_path": "jobs>avg_salary" } } } } #按照年龄对平均工资求导 POST employees/_search { "size": 0, "aggs": { "age": { "histogram": { "field": "age", "min_doc_count": 1, "interval": 1 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } }, "derivative_avg_salary":{ "derivative": { "buckets_path": "avg_salary" } } } } } } #Cumulative_sum POST employees/_search { "size": 0, "aggs": { "age": { "histogram": { "field": "age", "min_doc_count": 1, "interval": 1 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } }, "cumulative_salary":{ "cumulative_sum": { "buckets_path": "avg_salary" } } } } } } #Moving Function POST employees/_search { "size": 0, "aggs": { "age": { "histogram": { "field": "age", "min_doc_count": 1, "interval": 1 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } }, "moving_avg_salary":{ "moving_fn": { "buckets_path": "avg_salary", "window":10, "script": "MovingFunctions.min(values)" } } } } } }
ES聚合分析默认作用范围是query的查询结果集
同时ES还支持一下方式改变聚合查询的作用范围
#作用范围 # Query 的作用范围 POST employees/_search { "size": 0, "query": { "range": { "age": { "gte": 20 } } }, "aggs": { "jobs": { "terms": { "field":"job.keyword" } } } } #Filter 的作用范围 POST employees/_search { "size": 0, "aggs": { "older_person": { "filter":{ "range":{ "age":{ "from":35 } } }, "aggs":{ "jobs":{ "terms": { "field":"job.keyword" } } }}, "all_jobs": { "terms": { "field":"job.keyword" } } } } #Post field. 一条语句,找出所有的job类型。还能找到聚合后符合条件的结果 POST employees/_search { "aggs": { "jobs": { "terms": { "field": "job.keyword" } } }, "post_filter": { "match": { "job.keyword": "Dev Manager" } } } #global POST employees/_search { "size": 0, "query": { "range": { "age": { "gte": 40 } } }, "aggs": { "jobs": { "terms": { "field":"job.keyword" } }, "all":{ "global":{}, "aggs":{ "salary_avg":{ "avg":{ "field":"salary" } } } } } }
排序:
指定order,安装count和key进行排序
#排序 order #count and key POST employees/_search { "size": 0, "query": { "range": { "age": { "gte": 20 } } }, "aggs": { "jobs": { "terms": { "field":"job.keyword", "order":[ {"_count":"asc"}, {"_key":"desc"} ] } } } } #排序 order #count and key POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field":"job.keyword", "order":[ { "avg_salary":"desc" }] }, "aggs": { "avg_salary": { "avg": { "field":"salary" } } } } } } #排序 order #count and key POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field":"job.keyword", "order":[ { "stats_salary.min":"desc" }] }, "aggs": { "stats_salary": { "stats": { "field":"salary" } } } } } }
使用场景:
一般以下情况,需要重新索引
ES内置提供的API
UpdateByQuery 在现有索引上重建
Reindex 在其他索引上重建索引
案例1
#重建索引 DELETE blogs/ # 写入文档 PUT blogs/_doc/1 { "content":"Hadoop is cool", "keyword":"hadoop" } # 查看 Mapping GET blogs/_mapping # 修改 Mapping,增加子字段,使用英文分词器 PUT blogs/_mapping { "properties" : { "content" : { "type" : "text", "fields" : { "english" : { "type" : "text", "analyzer":"english" } } } } } # 写入文档 PUT blogs/_doc/2 { "content":"Elasticsearch rocks", "keyword":"elasticsearch" } # 查询新写入文档 POST blogs/_search { "query": { "match": { "content.english": "Elasticsearch" } } } # 查询 Mapping 变更前写入的文档 POST blogs/_search { "query": { "match": { "content.english": "Hadoop" } } } # Update所有文档 POST blogs/_update_by_query { } # 执行update_by_query后 再查询之前写入的文档 POST blogs/_search { "query": { "match": { "content.english": "Hadoop" } } }
案例2,更新已有字段的mapping
# 查询 GET blogs/_mapping #结果查询,我们看keyword 的字段类型是Text { "blogs" : { "mappings" : { "properties" : { "content" : { "type" : "text", "fields" : { "english" : { "type" : "text", "analyzer" : "english" }, "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "keyword" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } } } #尝试修改类型,报错,ES不允许对已有字段进行修改 PUT blogs/_mapping { "properties" : { "content" : { "type" : "text", "fields" : { "english" : { "type" : "text", "analyzer" : "english" } } }, "keyword" : { "type" : "keyword" } } } # 创建新的索引并且设定新的Mapping PUT blogs_fix/ { "mappings": { "properties" : { "content" : { "type" : "text", "fields" : { "english" : { "type" : "text", "analyzer" : "english" } } }, "keyword" : { "type" : "keyword" } } } } # Reindx API POST _reindex { "source": { "index": "blogs" }, "dest": { "index": "blogs_fix" } } #查看新索引 GET blogs_fix/_doc/1 #查询结果 { "_index" : "blogs_fix", "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "content" : "Hadoop is cool", "keyword" : "hadoop" } } # 测试 Term Aggregation POST blogs_fix/_search { "size": 0, "aggs": { "blog_keyword": { "terms": { "field": "keyword", "size": 10 } } } } #我们修改成keyword类型,只有keyword 才能Term Aggregation #查询结果 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "blog_keyword" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "elasticsearch", "doc_count" : 1 }, { "key" : "hadoop", "doc_count" : 1 } ] } } }
Reindex以上总结
Reindex API支持从一个索引拷贝到另一个索引中
使用ReindexAPI的场景:
Ingest Node
ES5.0后,引入的一种新的节点类型,默认配置下,每个节点都是Ingest Node
无需Logstash,就可以进行数据的预处理,例如:
Demo
创建文档
#Blog数据,包含3个字段,tags用逗号间隔 PUT tech_blogs/_doc/1 { "title":"Introducing big data......", "tags":"hadoop,elasticsearch,spark", "content":"You konw, for big data" }
POST _ingest/pipeline/_simulate { "pipeline": { "description": "to split blog tags", // 按,切割 "processors": [ { "split": { "field": "tags", "separator": "," } } ] }, "docs": [ { "_index": "index", "_id": "id", "_source": { "title": "Introducing big data......", "tags": "hadoop,elasticsearch,spark", "content": "You konw, for big data" } }, { "_index": "index", "_id": "idxx", "_source": { "title": "Introducing cloud computering", "tags": "openstack,k8s", "content": "You konw, for cloud" } } ] }
#同时为文档,增加一个字段。blog查看量 POST _ingest/pipeline/_simulate { "pipeline": { "description": "to split blog tags", "processors": [ { "split": { "field": "tags", "separator": "," } }, // 增加一个字段, { "set":{ "field": "views", "value": 0 } } ] }, "docs": [ { "_index":"index", "_id":"id", "_source":{ "title":"Introducing big data......", "tags":"hadoop,elasticsearch,spark", "content":"You konw, for big data" } }, { "_index":"index", "_id":"idxx", "_source":{ "title":"Introducing cloud computering", "tags":"openstack,k8s", "content":"You konw, for cloud" } } ] }
以上是测试可以使用,我们测试完成后,在ES上创建一个Pipeline
PUT _ingest/pipeline/blog_pipeline { "description": "a blog pipeline", "processors": [ { "split": { "field": "tags", "separator": "," } }, { "set":{ "field": "views", "value": 0 } } ] }
#查看Pipleline
GET _ingest/pipeline/blog_pipeline
#测试pipeline,只需要提供文档的数组就可以了 POST _ingest/pipeline/blog_pipeline/_simulate { "docs": [ { "_source": { "title": "Introducing cloud computering", "tags": "openstack,k8s", "content": "You konw, for cloud" } } ] }
#测试2 情况索引 DELETE tech_blogs #不使用pipeline更新数据 PUT tech_blogs/_doc/1 { "title":"Introducing big data......", "tags":"hadoop,elasticsearch,spark", "content":"You konw, for big data" } #使用pipeline更新数据 PUT tech_blogs/_doc/2?pipeline=blog_pipeline { "title": "Introducing cloud computering", "tags": "openstack,k8s", "content": "You konw, for cloud" } #查看两条数据,一条被处理,一条未被处理 POST tech_blogs/_search {} #update_by_query 会导致错误 POST tech_blogs/_update_by_query?pipeline=blog_pipeline { } #增加update_by_query的条件 POST tech_blogs/_update_by_query?pipeline=blog_pipeline { "query": { "bool": { "must_not": { "exists": { "field": "views" } } } } } #再次索引,这次我们可以看到文档1也被pipeline处理了 POST tech_blogs/_search
一些内置的Processors
Painless
Painless 用途:
可以对文档字段加工处理
在Ingest Pipeline 中执行脚本
在Reindex API, Update By Query时,对数据进行处理
#########Demo for Painless############### # 增加一个 Script Prcessor POST _ingest/pipeline/_simulate { "pipeline": { "description": "to split blog tags", "processors": [ { "split": { "field": "tags", "separator": "," } }, { "script": { "source": """ if(ctx.containsKey("content")){ ctx.content_length = ctx.content.length(); }else{ ctx.content_length=0; } """ } }, { "set":{ "field": "views", "value": 0 } } ] }, "docs": [ { "_index":"index", "_id":"id", "_source":{ "title":"Introducing big data......", "tags":"hadoop,elasticsearch,spark", "content":"You konw, for big data" } }, { "_index":"index", "_id":"idxx", "_source":{ "title":"Introducing cloud computering", "tags":"openstack,k8s", "content":"You konw, for cloud" } } ] } DELETE tech_blogs PUT tech_blogs/_doc/1 { "title":"Introducing big data......", "tags":"hadoop,elasticsearch,spark", "content":"You konw, for big data", "views":0 } POST tech_blogs/_update/1 { "script": { "source": "ctx._source.views += params.new_views", "params": { "new_views":100 } } } # 查看views计数 POST tech_blogs/_search { } #保存脚本在 Cluster State POST _scripts/update_views { "script":{ "lang": "painless", "source": "ctx._source.views += params.new_views" } } POST tech_blogs/_update/1 { "script": { "id": "update_views", "params": { "new_views":1000 } } } GET tech_blogs/_search { "script_fields": { "rnd_views": { "script": { "lang": "painless", "source": """ java.util.Random rnd = new Random(); doc[‘views‘].value+rnd.nextInt(1000); """ } } }, "query": { "match_all": {} } }
标签:重命名 size you analyzer 影响 分布式系统 系统配置 调用 转换
原文地址:https://www.cnblogs.com/xzkzzz/p/12119387.html