Elasticsearch 聚合损失精度的问题解决思路_app

因为java 浮点类型（double/float）类型在做运算时会存在丢失精度的问题。
es是使用java开发实现，所以同样的问题在es也存在。现在以示例的方式展现在es中如何规避这个问题。
es版本: 6.5.4

以docker容器的方式快速启动es

docker run --name es6 --net host -e "discovery.type=single-node" docker.io/elasticsearch:6.5.4

实例演示创建索引：

curl -X PUT http://127.0.0.1:9200/index01

删除索引：

curl -XDELETE http://127.0.0.1:9200/index01

创建mapping

curl -XPOST 'http://127.0.0.1:9200/index01/type01/_mapping?pretty' -H "Content-Type: application/json"  \
-d '
{
    "type01": {
        "properties": {
            "tm": {
                "type": "date",
                "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
            },
            "name": {
                "type": "keyword"
            },
            "address": {
                "type": "text"
            },
            "price1": {
                "type": "double"
            },
            "price2": {
                "type": "scaled_float",
                "scaling_factor": 100
            }
        }
    }
}'

查看mapping

curl -XGET http://127.0.0.1:9200/app_dataheart_factoring_business_waybill?pretty

插入数据：

curl -XPOST http://127.0.0.1:9200/index01/type01/01?pretty -H "Content-Type: application/json" \
    -d '{
        "name":"zhangsan",
        "price1":1.0,
        "price2":1.0,
        "tm":"2022-01-01",
        "address":"beijing daxing"
        }'
        
curl -XPOST http://127.0.0.1:9200/index01/type01/02?pretty -H "Content-Type: application/json" \
    -d '{
        "name":"zhangsan",
        "price1":20.2,
        "price2":20.2,
        "tm":"2022-01-01",
        "address":"beijing daxing"
        }'
        
curl -XPOST http://127.0.0.1:9200/index01/type01/03?pretty -H "Content-Type: application/json" \
    -d '{
        "name":"zhangsan",
        "price1":300.03,
        "price2":300.03,
        "tm":"2022-01-01",
        "address":"beijing daxing"
        }'

查询数据：

 curl -XGET http://127.0.0.1:9200/index01/type01/_search?pretty

聚合： double类型聚合

curl -XGET http://127.0.0.1:9200/index01/type01/_search?pretty -H "Content-Type: application/json" \
    -d '{
            "aggs": {
                "sum_price1": {
                    "sum":{
                            "field": "price1"
                          }
                    }
                }
            }
        }'

结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "01",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 1.0,
          "price2" : 1.0,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "03",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 300.03,
          "price2" : 300.03,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "02",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 20.2,
          "price2" : 20.2,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      }
    ]
  },
  "aggregations" : {
    "sum_price1" : {
      "value" : 321.22999999999996
    }
  }
}

scaled_float类型聚合

 curl -XGET http://127.0.0.1:9200/index01/type01/_search?pretty -H "Content-Type: application/json" \
    -d '{
            "aggs": {
                "sum_price2": {
                    "sum":{
                            "field": "price2"
                          }
                    }
                }
            }
        }'

结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "01",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 1.0,
          "price2" : 1.0,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "03",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 300.03,
          "price2" : 300.03,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "02",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 20.2,
          "price2" : 20.2,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      }
    ]
  },
  "aggregations" : {
    "sum_price2" : {
      "value" : 321.23
    }
  }
}

结果分析 double类型在做运算时会存在丢失精度的问题scaled_float类型，在指定合适的缩放因子的前提下可以规避浮点类型运算丢失精度的问题

注意：

特别注意，需要知道导入price2字段的数据的最大精度，scaling_factor不能小于最大精度的小数位位数，否则可能丢失精度。另外scaled_float缩放类型的浮点型，使用注意：必须指定缩放因子scaling_factor。
ES索引时，原始值会乘以该缩放因子并四舍五入得到新值，ES内部储存的是这个新值，但返回结果仍是原始值。例如：scale_factor为10的scaled_float字段将在内部存储2.34为23，
查询时，ES都会将查询参数*10再四舍五入得到的值与23匹配，若能匹配到返回结果为2.34。

参考：
ES数据类型

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/web/1295027.html

Elasticsearch 聚合损失精度的问题解决思路

发表评论

评论列表（0条）