标签:analyzer recent 生成 too 干货 hits 名称 lang 均值
概要Elasticsearch的聚合查询,跟数据库的聚合查询效果是一样的,我们可以将二者拿来对比学习,如求和、求平均值、求最大最小等等。
数据分组,一些数据按照某个字段进行bucket划分,这个字段值相同的数据放到一个bucket中。可以理解成Java中的Map<String, List<Object>>结构,类似于Mysql中的group by后的查询结果。
对一个数据分组执行的统计,比如计算最大值,最小值,平均值等
类似于Mysql中的max(),min(),avg()函数的值,都是在group by后使用的。
我们还是以英文儿歌为案例背景,回顾一下索引结构:
PUT /music
{
"mappings": {
"children": {
"properties": {
"id": {
"type": "keyword"
},
"author_first_name": {
"type": "text",
"analyzer": "english"
},
"author_last_name": {
"type": "text",
"analyzer": "english"
},
"author": {
"type": "text",
"analyzer": "english",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"language": {
"type": "text",
"analyzer": "english",
"fielddata": true
},
"tags": {
"type": "text",
"analyzer": "english"
},
"length": {
"type": "long"
},
"likes": {
"type": "long"
},
"isRelease": {
"type": "boolean"
},
"releaseDate": {
"type": "date"
}
}
}
}
}
GET /music/children/_search
{
"size": 0,
"aggs": {
"song_qty_by_language": {
"terms": {
"field": "language"
}
}
}
}
语法解释:
响应结果如下:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"song_qty_by_language": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "english",
"doc_count": 5
}
]
}
}
}
语法解释:
默认按doc_count降序排序。
GET /music/children/_search
{
"size": 0,
"aggs": {
"lang": {
"terms": {
"field": "language"
},
"aggs": {
"length_avg": {
"avg": {
"field": "length"
}
}
}
}
}
}
这里演示的是两层aggs聚合查询,先按语种统计,得到数据分组,再在数据分组里算平均时长。
多个aggs嵌套语法也是如此,注意一下aggs代码块的位置即可。
最常用的统计:count,avg,max,min,sum,语法含义与mysql相同。
GET /music/children/_search
{
"size": 0,
"aggs": {
"color": {
"terms": {
"field": "language"
},
"aggs": {
"length_avg": {
"avg": {
"field": "length"
}
},
"length_max": {
"max": {
"field": "length"
}
},
"length_min": {
"min": {
"field": "length"
}
},
"length_sum": {
"sum": {
"field": "length"
}
}
}
}
}
}
以30秒为一段,看各段区间的平均值。
histogram语法位置跟terms一样,作范围分区,搭配interval参数一起使用
interval:30表示分的区间段为[0,30),[30,60),[60,90),[90,120)
段的闭合关系是左开右闭,如果数据在某段区间内没有,也会返回空的区间。
GET /music/children/_search
{
"size": 0,
"aggs": {
"sales_price_range": {
"histogram": {
"field": "length",
"interval": 30
},
"aggs": {
"length_avg": {
"avg": {
"field": "length"
}
}
}
}
}
}
这种数据的结果可以用来生成柱状图或折线图。
按月统计
date histogram与histogram语法类似,搭配date interval指定区间间隔
extended_bounds表示最大的时间范围。
GET /music/children/_search
{
"size": 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "releaseDate",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2019-10-01",
"max": "2019-12-31"
}
}
}
}
}
interval的值可以天、周、月、季度、年等。我们可以延伸一下,比如统计今年每个季度的新发布歌曲的点赞数量
GET /music/children/_search
{
"size": 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "releaseDate",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2019-01-01",
"max": "2019-12-31"
}
},
"aggs": {
"lang_qty": {
"terms": {
"field": "language"
},
"aggs": {
"like_sum": {
"sum": {
"field": "likes"
}
}
}
},
"total" :{
"sum": {
"field": "likes"
}
}
}
}
}
}
聚合查询可以和query搭配使用,相当于mysql中where与group by联合使用
GET /music/children/_search
{
"size": 0,
"query": {
"match": {
"language": "english"
}
},
"aggs": {
"sales": {
"terms": {
"field": "language"
}
}
}
}
GET /music/children/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"term": {
"language": "english"
}
}
}
},
"aggs": {
"sales": {
"terms": {
"field": "language"
}
}
}
}
global:就是global bucket,会将所有的数据纳入聚合scope,不受前面的query或filter影响。
global bucket适用于同时统计指定条件的数据与全部数据的对比,如我们创造的场景:指定作者的歌与全部歌曲的点赞数量对比。
GET /music/children/_search
{
"size": 0,
"query": {
"match": {
"author": "Jean Ritchie"
}
},
"aggs": {
"likes": {
"sum": {
"field": "likes"
}
},
"all": {
"global": {},
"aggs": {
"all_likes": {
"sum": {
"field": "likes"
}
}
}
}
}
}
aggs.filter针对是聚合里的数据
bucket filter:对不同的bucket下的aggs,进行filter
类似于mysql的中having语法
GET /music/children/_search
{
"size": 0,
"aggs": {
"recent_60d": {
"filter": {
"range": {
"releaseDate": {
"gte": "now-60d"
}
}
},
"aggs": {
"recent_60d_likes_sum": {
"sum": {
"field": "likes"
}
}
}
},
"recent_30d": {
"filter": {
"range": {
"releaseDate": {
"gte": "now-30d"
}
}
},
"aggs": {
"recent_30d_likes_sum": {
"avg": {
"field": "likes"
}
}
}
}
}
}
默认按doc_count降序排序,排序规则可以改,order里面可以指定aggs的别名,如length_avg,类似于mysql的order by cnt asc。
GET /music/children/_search
{
"size": 0,
"aggs": {
"group_by_lang": {
"terms": {
"field": "language",
"order": {
"length_avg": "desc"
}
},
"aggs": {
"length_avg": {
"avg": {
"field": "length"
}
}
}
}
}
}
本篇主要介绍常用的聚合查询,均以示例为主,了解基本写法后可以快速阅读,有不好理解的地方,多与我们熟悉的数据库查询SQL作比较,谢谢。
专注Java高并发、分布式架构,更多技术干货分享与心得,请关注公众号:Java架构社区
可以扫左边二维码添加好友,邀请你加入Java架构社区微信群共同探讨技术
标签:analyzer recent 生成 too 干货 hits 名称 lang 均值
原文地址:https://blog.51cto.com/2123175/2507370