经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 程序设计 » Elasticsearch » 查看文章
ElasticSearch必知必会-进阶篇
来源:cnblogs  作者:京东云开发者  时间:2023/1/18 8:43:49  对本文有异议

京东物流:康睿 姚再毅 李振 刘斌 王北永

说明:以下全部均基于elasticsearch8.1 版本

一.跨集群检索 - ccr

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html

跨集群检索的背景和意义

跨集群检索定义

跨集群检索环境搭建

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html

步骤1:搭建两个本地单节点集群,本地练习可取消安全配置

步骤2:每个集群都执行以下命令

PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster_one": { "seeds": [ "172.21.0.14:9301" ] },"cluster_two": { "seeds": [ "172.21.0.14:9302" ] } } } } }

步骤3:验证集群之间是否互通

方案1:Kibana 可视化查看:stack Management -> Remote Clusters -> status 应该是 connected! 且必须打上绿色的对号。

? 方案2:GET _remote/info

跨集群查询演练

  1. # 步骤1 在集群 1 中添加数据如下
  2. PUT test01/_bulk
  3. {"index":{"_id":1}}
  4. {"title":"this is from cluster01..."}
  5. # 步骤2 在集群 2 中添加数据如下:
  6. PUT test01/_bulk
  7. {"index":{"_id":1}}
  8. {"title":"this is from cluster02..."}
  9. # 步骤 3:执行跨集群检索如下: 语法:POST 集群名称1:索引名称,集群名称2:索引名称/_search
  10. POST cluster_one:test01,cluster_two:test01/_search
  11. {
  12. "took" : 7,
  13. "timed_out" : false,
  14. "num_reduce_phases" : 3,
  15. "_shards" : {
  16. "total" : 2,
  17. "successful" : 2,
  18. "skipped" : 0,
  19. "failed" : 0
  20. },
  21. "_clusters" : {
  22. "total" : 2,
  23. "successful" : 2,
  24. "skipped" : 0
  25. },
  26. "hits" : {
  27. "total" : {
  28. "value" : 2,
  29. "relation" : "eq"
  30. },
  31. "max_score" : 1.0,
  32. "hits" : [
  33. {
  34. "_index" : "cluster_two:test01",
  35. "_id" : "1",
  36. "_score" : 1.0,
  37. "_source" : {
  38. "title" : "this is from cluster02..."
  39. }
  40. },
  41. {
  42. "_index" : "cluster_one:test01",
  43. "_id" : "1",
  44. "_score" : 1.0,
  45. "_source" : {
  46. "title" : "this is from cluster01..."
  47. }
  48. }
  49. ]
  50. }
  51. }

二.跨集群复制 - ccs - 该功能需付费

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html

如何保障集群的高可用

  1. 副本机制
  2. 快照和恢复
  3. 跨集群复制(类似mysql 主从同步)

跨集群复制概述

跨集群复制配置

  1. 准备两个集群,网络互通
  2. 开启 license 使用,可试用30天
  • 开启位置:Stack Management -> License mangement.

3.定义好谁是Leads集群,谁是follower集群

4.在follower集群配置Leader集群

5.在follower集群配置Leader集群的索引同步规则(kibana页面配置)

a.stack Management -> Cross Cluster Replication -> create a follower index.

6.启用步骤5的配置


三索引模板

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

8.X之组件模板

1.创建组件模板-索引setting相关

  1. # 组件模板 - 索引setting相关
  2. PUT _component_template/template_sttting_part
  3. {
  4. "template": {
  5. "settings": {
  6. "number_of_shards": 3,
  7. "number_of_replicas": 0
  8. }
  9. }
  10. }

2.创建组件模板-索引mapping相关

  1. # 组件模板 - 索引mapping相关
  2. PUT _component_template/template_mapping_part
  3. {
  4. "template": {
  5. "mappings": {
  6. "properties": {
  7. "hosr_name":{
  8. "type": "keyword"
  9. },
  10. "cratet_at":{
  11. "type": "date",
  12. "format": "EEE MMM dd HH:mm:ss Z yyyy"
  13. }
  14. }
  15. }
  16. }
  17. }

3.创建组件模板-配置模板和索引之间的关联

  1. // **注意:composed_of 如果多个组件模板中的配置项有重复,后面的会覆盖前面的,和配置的顺序有关**
  2. # 基于组件模板,配置模板和索引之间的关联
  3. # 也就是所有 tem_* 该表达式相关的索引创建时,都会使用到以下规则
  4. PUT _index_template/template_1
  5. {
  6. "index_patterns": [
  7. "tem_*"
  8. ],
  9. "composed_of": [
  10. "template_sttting_part",
  11. "template_mapping_part"
  12. ]
  13. }

4.测试

  1. # 创建测试
  2. PUT tem_001

索引模板基本操作

实战演练

需求1:默认如果不显式指定Mapping,数值类型会被动态映射为long类型,但实际上业务数值都比较小,会存在存储浪费。需要将默认值指定为Integer

索引模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

mapping-动态模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

  1. # 结合mapping 动态模板 和 索引模板
  2. # 1.创建组件模板之 - mapping模板
  3. PUT _component_template/template_mapping_part_01
  4. {
  5. "template": {
  6. "mappings": {
  7. "dynamic_templates": [
  8. {
  9. "integers": {
  10. "match_mapping_type": "long",
  11. "mapping": {
  12. "type": "integer"
  13. }
  14. }
  15. }
  16. ]
  17. }
  18. }
  19. }
  20. # 2. 创建组件模板与索引关联配置
  21. PUT _index_template/template_2
  22. {
  23. "index_patterns": ["tem1_*"],
  24. "composed_of": ["template_mapping_part_01"]
  25. }
  26. # 3.创建测试数据
  27. POST tem1_001/_doc/1
  28. {
  29. "age":18
  30. }
  31. # 4.查看mapping结构验证
  32. get tem1_001/_mapping

需求2:date_*开头的字段,统一匹配为date日期类型。

索引模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

mapping-动态模板,官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

  1. # 结合mapping 动态模板 和 索引模板
  2. # 1.创建组件模板之 - mapping模板
  3. PUT _component_template/template_mapping_part_01
  4. {
  5. "template": {
  6. "mappings": {
  7. "dynamic_templates": [
  8. {
  9. "integers": {
  10. "match_mapping_type": "long",
  11. "mapping": {
  12. "type": "integer"
  13. }
  14. }
  15. },
  16. {
  17. "date_type_process": {
  18. "match": "date_*",
  19. "mapping": {
  20. "type": "date",
  21. "format":"yyyy-MM-dd HH:mm:ss"
  22. }
  23. }
  24. }
  25. ]
  26. }
  27. }
  28. }
  29. # 2. 创建组件模板与索引关联配置
  30. PUT _index_template/template_2
  31. {
  32. "index_patterns": ["tem1_*"],
  33. "composed_of": ["template_mapping_part_01"]
  34. }
  35. # 3.创建测试数据
  36. POST tem1_001/_doc/2
  37. {
  38. "age":19,
  39. "date_aoe":"2022-01-01 18:18:00"
  40. }
  41. # 4.查看mapping结构验证
  42. get tem1_001/_mapping

四.LIM 索引生命周期管理

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-lifecycle-management.html

什么是索引生命周期

索引的 生-> 老 -> 病 -> 死

是否有过考虑,如果一个索引,创建之后,就不再去管理了?会发生什么?

什么是索引生命周期管理

索引太大了会如何?

大索引的恢复时间,要远比小索引恢复慢的多的多索引大了以后,检索会很慢,写入和更新也会受到不同程度的影响索引大到一定程度,当索引出现健康问题,会导致整个集群核心业务不可用

最佳实践

集群的单个分片最大文档数上限:2的32次幂减1,即20亿左右官方建议:分片大小控制在30GB-50GB,若索引数据量无限增大,肯定会超过这个值

用户不关注全量

某些业务场景,业务更关注近期的数据,如近3天、近7天大索引会将全部历史数据汇集在一起,不利于这种场景的查询

索引生命周期管理的历史演变

LIM前奏 - rollover 滚动索引

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-rollover.html

  1. # 0.自测前提,lim生命周期rollover频率。默认10分钟
  2. PUT _cluster/settings
  3. {
  4. "persistent": {
  5. "indices.lifecycle.poll_interval": "1s"
  6. }
  7. }
  8. # 1. 创建索引,并指定别名
  9. PUT test_index-0001
  10. {
  11. "aliases": {
  12. "my-test-index-alias": {
  13. "is_write_index": true
  14. }
  15. }
  16. }
  17. # 2.批量导入数据
  18. PUT my-test-index-alias/_bulk
  19. {"index":{"_id":1}}
  20. {"title":"testing 01"}
  21. {"index":{"_id":2}}
  22. {"title":"testing 02"}
  23. {"index":{"_id":3}}
  24. {"title":"testing 03"}
  25. {"index":{"_id":4}}
  26. {"title":"testing 04"}
  27. {"index":{"_id":5}}
  28. {"title":"testing 05"}
  29. # 3.rollover 滚动规则配置
  30. POST my-test-index-alias/_rollover
  31. {
  32. "conditions": {
  33. "max_age": "7d",
  34. "max_docs": 5,
  35. "max_primary_shard_size": "50gb"
  36. }
  37. }
  38. # 4.在满足条件的前提下创建滚动索引
  39. PUT my-test-index-alias/_bulk
  40. {"index":{"_id":7}}
  41. {"title":"testing 07"}
  42. # 5.查询验证滚动是否成功
  43. POST my-test-index-alias/_search

LIM前奏 - shrink 索引压缩

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-shrink.html

核心步骤:

1. 将数据全部迁移至一个独立的节点

2. 索引禁止写入

3. 方可进行压缩

  1. # 1.准备测试数据
  2. DELETE kibana_sample_data_logs_ext
  3. PUT kibana_sample_data_logs_ext
  4. {
  5. "settings": {
  6. "number_of_shards": 5,
  7. "number_of_replicas": 0
  8. }
  9. }
  10. POST _reindex
  11. {
  12. "source": {
  13. "index": "kibana_sample_data_logs"
  14. },
  15. "dest": {
  16. "index": "kibana_sample_data_logs_ext"
  17. }
  18. }
  19. # 2.压缩前必要的条件设置
  20. # number_of_replicas :压缩后副本为0
  21. # index.routing.allocation.include._tier_preference 数据分片全部路由到hot节点
  22. # "index.blocks.write 压缩后索引不再允许数据写入
  23. PUT kibana_sample_data_logs_ext/_settings
  24. {
  25. "settings": {
  26. "index.number_of_replicas": 0,
  27. "index.routing.allocation.include._tier_preference": "data_hot",
  28. "index.blocks.write": true
  29. }
  30. }
  31. # 3.实施压缩
  32. POST kibana_sample_data_logs_ext/_shrink/kibana_sample_data_logs_ext_shrink
  33. {
  34. "settings":{
  35. "index.number_of_replicas": 0,
  36. "index.number_of_shards": 1,
  37. "index.codec":"best_compression"
  38. },
  39. "aliases":{
  40. "kibana_sample_data_logs_alias":{}
  41. }
  42. }

LIM实战

全局认知建立 - 四大阶段

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/overview-index-lifecycle-management.html

生命周期管理阶段(Policy):
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-index-lifecycle.html

Hot阶段(生)

Set priority

Unfollow

Rollover

Read-only

Shrink

Force Merge

Search snapshot

Warm阶段(老)

Set priority

Unfollow

Read-only

Allocate

migrate

Shirink

Force Merge

Cold阶段(病)

Search snapshot

Delete阶段(死)

delete

演练

1.创建policy

  • Hot阶段设置,rollover: max_age:3d,max_docs:5, max_size:50gb, 优先级:100

  • Warm阶段设置:min_age:15s , forcemerage段合并,热节点迁移到warm节点,副本数设置0,优先级:50

  • Cold阶段设置: min_age 30s, warm迁移到cold阶段

  • Delete阶段设置:min_age 45s,执行删除操作

  1. PUT _ilm/policy/kr_20221114_policy
  2. {
  3. "policy": {
  4. "phases": {
  5. "hot": {
  6. "min_age": "0ms",
  7. "actions": {
  8. "set_priority": {
  9. "priority": 100
  10. },
  11. "rollover": {
  12. "max_size": "50gb",
  13. "max_primary_shard_size": "50gb",
  14. "max_age": "3d",
  15. "max_docs": 5
  16. }
  17. }
  18. },
  19. "warm": {
  20. "min_age": "15s",
  21. "actions": {
  22. "forcemerge": {
  23. "max_num_segments": 1
  24. },
  25. "set_priority": {
  26. "priority": 50
  27. },
  28. "allocate": {
  29. "number_of_replicas": 0
  30. }
  31. }
  32. },
  33. "cold": {
  34. "min_age": "30s",
  35. "actions": {
  36. "set_priority": {
  37. "priority": 0
  38. }
  39. }
  40. },
  41. "delete": {
  42. "min_age": "45s",
  43. "actions": {
  44. "delete": {
  45. "delete_searchable_snapshot": true
  46. }
  47. }
  48. }
  49. }
  50. }
  51. }

2.创建index template

  1. PUT _index_template/kr_20221114_template
  2. {
  3. "index_patterns": ["kr_index-**"],
  4. "template": {
  5. "settings": {
  6. "index": {
  7. "lifecycle": {
  8. "name": "kr_20221114_policy",
  9. "rollover_alias": "kr-index-alias"
  10. },
  11. "routing": {
  12. "allocation": {
  13. "include": {
  14. "_tier_preference": "data-hot"
  15. }
  16. }
  17. },
  18. "number_of_shards": "3",
  19. "number_of_replicas": "1"
  20. }
  21. },
  22. "aliases": {},
  23. "mappings": {}
  24. }
  25. }

3.测试需要修改lim rollover刷新频率

  1. PUT _cluster/settings
  2. {
  3. "persistent": {
  4. "indices.lifecycle.poll_interval": "1s"
  5. }
  6. }

4.进行测试

  1. # 创建索引,并制定可写别名
  2. PUT kr_index-0001
  3. {
  4. "aliases": {
  5. "kr-index-alias": {
  6. "is_write_index": true
  7. }
  8. }
  9. }
  10. # 通过别名新增数据
  11. PUT kr-index-alias/_bulk
  12. {"index":{"_id":1}}
  13. {"title":"testing 01"}
  14. {"index":{"_id":2}}
  15. {"title":"testing 02"}
  16. {"index":{"_id":3}}
  17. {"title":"testing 03"}
  18. {"index":{"_id":4}}
  19. {"title":"testing 04"}
  20. {"index":{"_id":5}}
  21. {"title":"testing 05"}
  22. # 通过别名新增数据,触发rollover
  23. PUT kr-index-alias/_bulk
  24. {"index":{"_id":6}}
  25. {"title":"testing 06"}
  26. # 查看索引情况
  27. GET kr_index-0001
  28. get _cat/indices?v

过程总结

第一步:配置 lim pollicy

  • 横向:Phrase 阶段(Hot、Warm、Cold、Delete) 生老病死

  • 纵向:Action 操作(rollover、forcemerge、readlyonly、delete)

第二步:创建模板 绑定policy,指定别名

第三步:创建起始索引

第四步:索引基于第一步指定的policy进行滚动


五.Data Stream

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-actions.html

特性解析

Data Stream让我们跨多个索引存储时序数据,同时给了唯一的对外接口(data stream名称)

  • 写入和检索请求发给data stream

  • data stream将这些请求路由至 backing index(后台索引)

Backing indices

每个data stream由多个隐藏的后台索引构成

  • 自动创建

  • 要求模板索引

rollover 滚动索引机制用于自动生成后台索引

  • 将成为data stream 新的写入索引

应用场景

  1. 日志、事件、指标等其他持续创建(少更新)的业务数据
  2. 两大核心特点
  3. 时序性数据
  4. 数据极少更新或没有更新

创建Data Stream 核心步骤

官网文档地址:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html

Set up a data stream

To set up a data stream, follow these steps:

  1. Create an index lifecycle policy
  2. Create component templates
  3. Create an index template
  4. Create the data stream
  5. Secure the data stream

演练

1. 创建一个data stream,名称为my-data-stream

2. index_template 名称为 my-index-template

3. 满足index格式【"my-data-stream*"】的索引都要被应用到

4. 数据插入的时候,在data_hot节点

5. 过3分钟之后要rollover到data_warm节点

6. 再过5分钟要到data_cold节点

  1. # 步骤1 。创建 lim policy
  2. PUT _ilm/policy/my-lifecycle-policy
  3. {
  4. "policy": {
  5. "phases": {
  6. "hot": {
  7. "actions": {
  8. "rollover": {
  9. "max_size": "50gb",
  10. "max_age": "3m",
  11. "max_docs": 5
  12. },
  13. "set_priority": {
  14. "priority": 100
  15. }
  16. }
  17. },
  18. "warm": {
  19. "min_age": "5m",
  20. "actions": {
  21. "allocate": {
  22. "number_of_replicas": 0
  23. },
  24. "forcemerge": {
  25. "max_num_segments": 1
  26. },
  27. "set_priority": {
  28. "priority": 50
  29. }
  30. }
  31. },
  32. "cold": {
  33. "min_age": "6m",
  34. "actions": {
  35. "freeze":{}
  36. }
  37. },
  38. "delete": {
  39. "min_age": "45s",
  40. "actions": {
  41. "delete": {}
  42. }
  43. }
  44. }
  45. }
  46. }
  47. # 步骤2 创建组件模板 - mapping
  48. PUT _component_template/my-mappings
  49. {
  50. "template": {
  51. "mappings": {
  52. "properties": {
  53. "@timestamp": {
  54. "type": "date",
  55. "format": "date_optional_time||epoch_millis"
  56. },
  57. "message": {
  58. "type": "wildcard"
  59. }
  60. }
  61. }
  62. },
  63. "_meta": {
  64. "description": "Mappings for @timestamp and message fields",
  65. "my-custom-meta-field": "More arbitrary metadata"
  66. }
  67. }
  68. # 步骤3 创建组件模板 - setting
  69. PUT _component_template/my-settings
  70. {
  71. "template": {
  72. "settings": {
  73. "index.lifecycle.name": "my-lifecycle-policy",
  74. "index.routing.allocation.include._tier_preference":"data_hot"
  75. }
  76. },
  77. "_meta": {
  78. "description": "Settings for ILM",
  79. "my-custom-meta-field": "More arbitrary metadata"
  80. }
  81. }
  82. # 步骤4 创建索引模板
  83. PUT _index_template/my-index-template
  84. {
  85. "index_patterns": ["my-data-stream*"],
  86. "data_stream": { },
  87. "composed_of": [ "my-mappings", "my-settings" ],
  88. "priority": 500,
  89. "_meta": {
  90. "description": "Template for my time series data",
  91. "my-custom-meta-field": "More arbitrary metadata"
  92. }
  93. }
  94. # 步骤5 创建 data stream 并 写入数据测试
  95. PUT my-data-stream/_bulk
  96. { "create":{ } }
  97. { "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
  98. { "create":{ } }
  99. { "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
  100. POST my-data-stream/_doc
  101. {
  102. "@timestamp": "2099-05-06T16:21:15.000Z",
  103. "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
  104. }
  105. # 步骤6 查看data stream 后台索引信息
  106. GET /_resolve/index/my-data-stream*

原文链接:https://www.cnblogs.com/Jcloud/p/17056216.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号