Elasticsearch 性能调优指南——推荐实战 DSL-技术圈

本文是在以下 6 篇基础上的继续扩展版本，不求大而全，只求对实战有帮助。

1、干货 | Elasticsearch 运维实战常用命令清单

2、干货 | Elasticsearch Top10 监控指标

3、你不得不关注的 Elasticsearch Top X 关键指标

4、干货 | Elasticsearch 开发实战常用命令清单

5、干货 | Elasticsearch开发人员最佳实战指南

6、Elasitcsearch 开发运维常用命令集锦

1、未分配分片查看

GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state:asc

2、动态调整副本数

PUT my-index-2021.05.30-000002/_settings
{"number_of_replicas": 0}

ps：主分片不可以修改（除非shrink），但：副本可以动态调整大小。

3、重新打开分片分配策略

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

4、手动移动未分配的分片

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "test",
        "shard": 0,
        "from_node": "node1",
        "to_node": "node2"
      }
    },
    {
      "allocate_replica": {
        "index": "test",
        "shard": 1,
        "node": "node3"
      }
    }
  ]
}

5、查看磁盘使用率

GET /_cat/allocation?v

使用率大于等于85%，知道已经达到警戒水位线了，要预警。

6、查看各个节点的版本号

GET /_cat/nodes?v&h=host,name,version

集群多节点版本不一致，可能会引发各种未知异常。

7、检索性能调优实战

7.1 慢日志设置

PUT /my-index-000001/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.query.debug": "2s",
  "index.search.slowlog.threshold.query.trace": "500ms",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.search.slowlog.threshold.fetch.info": "800ms",
  "index.search.slowlog.threshold.fetch.debug": "500ms",
  "index.search.slowlog.threshold.fetch.trace": "200ms",
  "index.search.slowlog.level": "info"
}

7.2 构建Mapping 设置路由

PUT my-index-000002
{
  "mappings": {
    "_routing": {
      "required": true 
    }
  }
}

7.3 段合并

POST /my-index-000001/_forcemerge

8、写入优化篇

8.1 批量写入

POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

8.2 增加刷新频率

PUT /my-index-000001/_settings
{
  "index" : {
    "refresh_interval" : "30s"
  }
}

8.3 副本设置为0，提升写入

PUT my-index-000001/_settings
{
  "number_of_replicas": 0
}

8.4 translog 异步刷盘

PUT my-index-2021.06.03/_settings
{
  "index": {
    "translog": {
      "durability": "async"
    }
  }
}

9、堆内存调优实战

在 jvm.option 配置文件中（不支持动态调整，需要重启后生效）

ES_HEAP_SIZE=DESIRED_SIZE (e.g. "3g")

10、磁盘不足解决方案

磁盘的三个警戒水位线需要再一次强调。如果 Elasticsearch 集群节点的磁盘空间不足，则会影响集群性能。

一旦可用存储空间低于特定阈值限制，它将开始阻止写入操作，进而影响数据进入集群。那么，如何扩展呢？

三个警戒水位线推荐阅读：你不得不关注的 Elasticsearch Top X 关键指标。

第一：横向扩展，添加数据节点（前提分片分配相对合理）。

第二：纵向扩展，升级机器，加磁盘（可能需要调整data.path）。

第三：迁移数据，历史久远不用数据可以考虑迁移到别的集群，或者归档到别的机器。

小结

性能调优非一朝一夕之功，推荐阅读官方文档的：“How to ” 部分。该部分针对写入、检索等性能优化，更加细致、全面有针对性。

本文是从命令行排查问题角度出发做的小结，希望对大家排查线上问题有所帮助。关于性能优化，结合如下几篇实战文章，能进一步提升认知。

Elasticsearch性能优化实战指南
让Elasticsearch飞起来!——性能优化实践干货
Elasticsearch大文件检索性能提升20倍实践（干货）
Elasticsearch 聚合性能优化六大猛招
Elasticsearch 高基数聚合性能提升3倍，改动了什么？

参考

elasticsearch_monitoring_cheatsheet.pdf

https://www.elastic.co/guide/en/elasticsearch/reference/7.14/modules-cluster.html#disk-based-shard-allocation

https://www.elastic.co/guide/en/elasticsearch/reference/7.14/how-to.html