Elasticsearch使用篇 - 组合聚合
创始人
2024-06-02 01:28:39
0

Composite aggregation

[kəmˈpɑːzət],组合聚合。属于分桶聚合。

基于不同的源(source)来创建组合聚合(composite aggregation)桶。可以对多级的聚合的结果进行分页。该聚合方式提供了一种方式来流化某种聚合的所有桶,类似于文档的滚动(scroll)。

组合聚合目前不兼容 pipeline aggregation。

组合聚合基于文档的值来创建一个组合,每个组合可以看作是一个组合桶。

比如,文档的内容如下:

{"keyword": ["foo", "bar"],"number": [23, 65, 76]
}

通过使用组合聚合的方式,会产生如下几种组合桶。

{ "keyword": "foo", "number": 23 }
{ "keyword": "foo", "number": 65 }
{ "keyword": "foo", "number": 76 }
{ "keyword": "bar", "number": 23 }
{ "keyword": "bar", "number": 65 }
{ "keyword": "bar", "number": 76 }
  • sources:定义聚合源的列表。每个聚合源的名称需要唯一。

  • missing_bucket :默认 false,即如果某个聚合源的结果为空,则整体的组合聚合的结果会输出 []。如果设置 true,只有结果为空的聚合源输出 null,其它聚合源正常输出。

  • size:限制组合聚合的结果输出多少条数据。默认 10。

  • after:设置当前页的起点,即上一页的最后一条数据。

聚合源

terms、histogram、date_histogram、geotile_grid 四种聚合可以作为聚合源。

terms聚合作为聚合源

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_FlightTimeMin": {"terms": {"field": "FlightTimeMin"}}}
}

这种方式等价于直接使用 terms 聚合。

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"runtime_mappings": {"FlightTimeMinChanged": {"type": "double","script": {"source": """emit(doc['FlightTimeMin'].value / 10)"""}}},"aggs": {"composite_FlightTimeMinChanged": {"composite": {"sources": [{"terms_FlightTimeMinChanged": {"terms": {"field": "FlightTimeMinChanged"}}}]}}}
}

支持运行时字段来创建组合桶。

histogram聚合作为聚合源

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_FlightTimeMin": {"composite": {"sources": [{"histogram_FlightTimeMin": {"histogram": {"field": "FlightTimeMin","interval": 10}}}]}}}
}

date_histogram聚合作为聚合源

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_timestamp": {"composite": {"sources": [{"date_histogram_timestamp": {"date_histogram": {"field": "timestamp","calendar_interval": "1d","format": "yyyy-MM-dd"}}}]}}}
}

多种聚合源组合在一起

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_timestamp_FlightTimeMin": {"composite": {"sources": [{"date_histogram_timestamp": {"date_histogram": {"field": "timestamp","calendar_interval": "1d","format": "yyyy-MM-dd"}}},{"terms_FlightTimeMin": {"terms": {"field": "FlightTimeMin"}}}]}}}
}

不同聚合源分别指定排序规则

先按照第一个聚合源进行排序,然后第二个。。以此类推。

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_timestamp_FlightTimeMin": {"composite": {"sources": [{"date_histogram_timestamp": {"date_histogram": {"field": "timestamp","calendar_interval": "1d","format": "yyyy-MM-dd","order": "desc"}}},{"terms_FlightTimeMin": {"terms": {"field": "FlightTimeMin","order": "asc"}}}]}}}
}

组合聚合与子聚合之间的对比

首先使用组合聚合的方式,按照 OriginCountry、DestCountry 两个字段进行词项聚合。

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_OriginCountry_DestCountry": {"composite": {"sources": [{"terms_OriginCountry": {"terms": {"field": "OriginCountry"}}},{"terms_DestCountry": {"terms": {"field": "DestCountry"}}}]}}}
}

聚合结果如下:

"aggregations" : {"composite_OriginCountry_DestCountry" : {"after_key" : {"terms_OriginCountry" : "AE","terms_DestCountry" : "CA"},"buckets" : [{"key" : {"terms_OriginCountry" : "AE","terms_DestCountry" : "AE"},"doc_count" : 9},{"key" : {"terms_OriginCountry" : "AE","terms_DestCountry" : "AR"},"doc_count" : 10},。。。。。。

作为对比,我们再使用 terms 子聚合的方式。

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"terms_OriginCountry": {"terms": {"field": "OriginCountry"},"aggs": {"terms_DestCountry": {"terms": {"field": "DestCountry"}}}}}
}

聚合结果如下:

"aggregations" : {"terms_OriginCountry" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4114,"buckets" : [{"key" : "IT","doc_count" : 2278,"terms_DestCountry" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 513,"buckets" : [{"key" : "IT","doc_count" : 459},{"key" : "US","doc_count" : 328},{"key" : "CN","doc_count" : 195},{"key" : "CA","doc_count" : 192},

missing_bucket参数

在第二个聚合源中,我们指定一个不存在的字段 FlightTimeMin2。通过修改 missing_bucket 参数的值,对比它的作用。

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_timestamp_FlightTimeMin": {"composite": {"sources": [{"date_histogram_timestamp": {"date_histogram": {"field": "timestamp","calendar_interval": "1d","format": "yyyy-MM-dd","order": "desc"}}},{"terms_FlightTimeMin": {"terms": {"field": "FlightTimeMin2","order": "asc","missing_bucket": false}}}]}}}
}

after参数

从上一页的 after_key 中,可以得到最后一条数据的内容。

"after_key" : {"date_histogram_timestamp" : "2022-08-28","terms_FlightTimeMin" : 32.9625244140625
}

接下来将 after 参数的内容修改为上述 after_key 的内容,也就是基于上一页来展示下一页的数据内容。

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_timestamp_FlightTimeMin": {"composite": {"size": 5, "after": {"date_histogram_timestamp" : "2022-08-28","terms_FlightTimeMin" : 13.010112762451172}, "sources": [{"date_histogram_timestamp": {"date_histogram": {"field": "timestamp","calendar_interval": "1d","format": "yyyy-MM-dd"}}},{"terms_FlightTimeMin": {"terms": {"field": "FlightTimeMin","missing_bucket": true}}}]}}}
}

支持嵌入子聚合

GET kibana_sample_data_flights/_search
{"track_total_hits": true,"size": 0,"aggs": {"composite_timestamp_FlightTimeMin": {"composite": {"size": 2, "after": {"date_histogram_timestamp" : "2022-08-28","terms_FlightTimeMin" : 13.010112762451172}, "sources": [{"date_histogram_timestamp": {"date_histogram": {"field": "timestamp","calendar_interval": "1d","format": "yyyy-MM-dd"}}},{"terms_FlightTimeMin": {"terms": {"field": "FlightTimeMin","missing_bucket": true}}}]},"aggs": {"stats_FlightTimeMin": {"stats": {"field": "FlightTimeMin"}}}}}
}

聚合结果输出如下:

"aggregations" : {"composite_timestamp_FlightTimeMin" : {"after_key" : {"date_histogram_timestamp" : "2022-08-28","terms_FlightTimeMin" : 17.2014217376709},"buckets" : [{"key" : {"date_histogram_timestamp" : "2022-08-28","terms_FlightTimeMin" : 16.21676254272461},"doc_count" : 1,"stats_FlightTimeMin" : {"count" : 1,"min" : 16.21676254272461,"max" : 16.21676254272461,"avg" : 16.21676254272461,"sum" : 16.21676254272461}},{"key" : {"date_histogram_timestamp" : "2022-08-28","terms_FlightTimeMin" : 17.2014217376709},"doc_count" : 1,"stats_FlightTimeMin" : {"count" : 1,"min" : 17.2014217376709,"max" : 17.2014217376709,"avg" : 17.2014217376709,"sum" : 17.2014217376709}}]}}

相关内容

热门资讯

监控摄像头接入GB28181平... 流程简介将监控摄像头的视频在网站和APP中直播,要解决的几个问题是:1&...
Windows10添加群晖磁盘... 在使用群晖NAS时,我们需要通过本地映射的方式把NAS映射成本地的一块磁盘使用。 通过...
protocol buffer... 目录 目录 什么是protocol buffer 1.protobuf 1.1安装  1.2使用...
在Word、WPS中插入AxM... 引言 我最近需要写一些文章,在排版时发现AxMath插入的公式竟然会导致行间距异常&#...
【PdgCntEditor】解... 一、问题背景 大部分的图书对应的PDF,目录中的页码并非PDF中直接索引的页码...
修复 爱普生 EPSON L4... L4151 L4153 L4156 L4158 L4163 L4165 L4166 L4168 L4...
Fluent中创建监测点 1 概述某些仿真问题,需要创建监测点,用于获取空间定点的数据࿰...
educoder数据结构与算法...                                                   ...
MySQL下载和安装(Wind... 前言:刚换了一台电脑,里面所有东西都需要重新配置,习惯了所...
MFC文件操作  MFC提供了一个文件操作的基类CFile,这个类提供了一个没有缓存的二进制格式的磁盘...