elasticsearch基本 *** 作_随笔

elasticsearch基本 *** 作 elasticSearch：搜索引擎非关系型数据库

Elasticsearch也是基于Lucene的全文检索库，本质也是存储数据，很多概念与MySQL类似的。

对比关系：

indices索引库			database数据库
type类型				table表
document文档			row行
field字段				column
mapping映射				schema约束

索引库 *** 作

	查看所有索引库：GET /_cat/indices?v
	查看一个索引库：GET /索引库名
				例：GET /atguigu
	创建索引库：
		PUT /索引库名
		{
			"settings": {
				"number_of_shards": 3,
				"number_of_replicas": 2
			}
		}
		例：
		PUT /atguigu2
		{
		  "settings": {
			"number_of_shards": 3,
			"number_of_replicas": 2
		  }
		}
	删除一个索引库：
		DELETe /索引库名
	例：DELETE /atguigu

映射 *** 作:

type字段类型：String（text  keyword） Numeric（long integer float double） date boolean
index是否索引：取决于是否以该字段进行搜索，默认为true
store是否存储：取决于搜索结果集中是否展示该字段。es中即使设置为false，依然存储，存储到_source。如果设置为true，会额外保存一份
analyzer分词器：这里使用ik分词器：`ik_max_word`或者`ik_smart`

创建映射字段

（和修改映射相同）

	PUT /索引库名/_mapping
	{
		"properties": {
			"字段名": {
				"type": 
				"index":
				"store": 
				"analyzer"
			}
		}
	}
	
	例：
	PUT /atguigu/_mapping
	{
	  "properties": {
		"title": {
		  "type": "text",
		  "index": true,
		  "store": false,
		  "analyzer": "ik_max_word"
		},
		"images": {
		  "type": "keyword",
		  "index": false
		},
		"price": {
		  "type": "double"
		}
	  }
	}

查看映射关系

语法：
GET /索引库名/_mapping
示例：
GET /atguigu/_mapping

文档（document）-CURD

有了索引、类型和映射，就可以对文档做增删改查 *** 作了。

新增/更新:（覆盖更新）

语法：
		POST /索引库名/_doc/{id}
		{
			"字段名": "字段值"
		}
例：
		POST /atguigu/_doc/1
		{
		  "title":"小米手机",
		  "images":"http://image.jd.com/12479122.jpg",
		  "price":2899,
		  "stock": 200,
		  "saleable":true,
		  "attr": {
			"category": "手机",
			"brand": "小米"
		  }
		}

删除：

语法：	DELETE /索引库名/_doc/{id}
例：	    DELETE /atguigu/_doc/1

文档查询

之前已经见识了查询功能

查询所有：

GET /{index}/_search

根据id查询：

GET /{index}/_doc/{id}

除了上述简单查询之外。elasticsearch作为搜索引擎，最复杂最强大的功能就是搜索查询功能。包括：匹配查询、词条查询、模糊查询、组合查询、范围查询、高亮、排序、分页等等查询功能。

基本查询语法如下：

GET /索引库名/_search
{
    "query":{
        "查询类型":{
            "查询条件":"查询条件值"
        }
    }
}

示例查询所有：

GET /atguigu/_search
{
  "query": {
    "match_all": {}
  }
}

这里的query代表一个查询对象，里面可以有不同的查询属性

查询类型：
- 例如：match_all， match，term ， range 等等
查询条件：查询条件会根据类型的不同，写法也有差异

查询结果：

took：查询花费时间，单位是毫秒
time_out：是否超时
_shards：分片信息
hits：搜索结果总览对象
- total：搜索到的总条数
- max_score：所有结果中文档得分的最高分
- hits：搜索结果的文档对象数组，每个元素是一条搜索到的文档信息
  - _index：索引库
  - _type：文档类型
  - _id：文档id
  - _score：文档得分
  - _source：文档的源数据

匹配查询（match）

es中operator默认是or，表示查询到结果中多个词之间是or的关系。
为了更精确查找，可以设置为and.
基本查询语法如下：

GET /索引库名/_search
{
    "query":{
        "match": {
			"字段名": {
				"query": "条件",
				"operator": "and/or"
			}
		}
    }
}

示例：

GET /atguigu/_search
{
  "query": {
    "match": {
      "title": {
        "query": "小米手机",
        "operator": "and"
      }
    }
  }
}

词条查询（term）
词条：最小的分词单元，条件必须是最小的分词单元
term 查询被用于精确值匹配，这些精确值可能是数字、时间、布尔或者那些未分词的字符串。
基本查询语法如下：

#单条查询
GET /索引库名/_search
{
    "query":{
        "term": {
			"字段名": {
				"value": "词条条件"
			}
		}
    }
}
#多条查询
GET /索引库名/_search
{
    "query":{
        "terms": {
			"字段名": [
				"词条条件"
			]
		}
    }
}

示例：

#单条查询
GET /atguigu/_search
{
    "query":{
        "term": {
			"title": {
				"value": "小米手机"
			}
		}
    }
}
#多条查询
GET /atguigu/_search
{
  "query": {
  	"terms": {
		"title": [
			"小米",
			"手机"
		]
	}
  }
}

范围查询（range）

range 查询找出那些落在指定区间内的数字或者时间
语法：

"range": {
	"字段名": {
		"gt/gte": 起始值,
		"lt/lte": 截止值
	}
}

示例：

GET /atguigu/_search
{
    "query":{
        "range": {
            "price": {
                "gte":  1000,
                "lt":   3000
            }
    	}
    }
}

range查询允许以下字符：

*** 作符说明gt大于gte大于等于lt小于lte小于等于布尔组合（bool)

布尔查询又叫组合查询

bool把各种其它查询通过must（与）、must_not（非）、should（或）的方式进行组合
语法：

"bool": {
	"must/must_not/should": [
		{},{}
	]
}

示例：

GET /atguigu/_search
{
    "query":{
        "bool":{
        	"must": [
        	  {
        	    "range": {
        	      "price": {
        	        "gte": 1000,
        	        "lte": 3000
        	      }
        	    }
        	  },
        	  {
        	    "range": {
        	      "price": {
        	        "gte": 2000,
        	        "lte": 4000
        	      }
        	    }
        	  }
        	]
        }
    }
}

注意：一个组合查询里面只能出现一种组合，不能混用

过滤（filter）

所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用。而是使用filter方式：
语法：

"bool": {
	"must": [],
	"filter": [],
}

示例：

GET /atguigu/_search
{
  "query": {
    "bool": {
      "must": {
        "match": { "title": "小米手机" }
      },
      "filter": {
        "range": {
          "price": { "gt": 2000, "lt": 3000 }
        }
      }
    }
  }
}

注意：filter中还可以再次进行bool组合条件过滤。

排序（sort）

sort 可以让我们按照不同的字段进行排序，并且通过order指定排序的方式
语法：

"sort": [
	{
		"字段名": {
			"order": "asc/desc"
		}
	}
]

示例：

GET /atguigu/_search
{
  "query": {
    "match": {
      "title": "小米手机"
    }
  },
  "sort": [
    {
      "price": { "order": "desc" }
    },
    {
      "_score": { "order": "desc"}
    }
  ]
}

分页（from/size）

语法：

from: (pageNum - 1) * pageSize
size: pageSize

示例：

GET /atguigu/_search
{
  "query": {
    "match": {
      "title": "小米手机"
    }
  },
  "from": 2,
  "size": 2
}

from：从那一条开始

size：取多少条

高亮（highlight）

查看百度高亮的原理：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Eqv3Dqzv-1638632702503)(assets/1563258499361.png)]

发现：高亮的本质是给关键字添加了标签，在前端再给该标签添加样式即可。

语法：

"hignlight": {
	"fields": {"字段名":{}},
	"pre_tags": "",
	"post_tags": ""
}

示例：

GET /atguigu/_search
{
  "query": {
    "match": {
      "title": "小米"
    }
  },
  "highlight": {
    "fields": {"title": {}}, 
    "pre_tags": "",
    "post_tags": ""
  }
}

fields：高亮字段

pre_tags：前置标签

post_tags：后置标签

查询结果如下：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ON8CORIA-1638632702505)(assets/1563258748370.png)]

结果过滤（_source）

默认情况下，elasticsearch在搜索的结果中，会把文档中保存在_source的所有字段都返回。

如果我们只想获取其中的部分字段，可以添加_source的过滤
使用includes输出字段列表中的字段
使用excludes输出字段列表以外的字段
语法：

"_source": {
	"includes/excludes": ["字段列表"]
}

示例：

GET /atguigu/_search
{
  "_source": ["title","price"],
  "query": {
    "term": {
      "price": 2699
    }
  }
}

返回结果，只有两个字段：

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "atguigu",
        "_type" : "goods",
        "_id" : "9",
        "_score" : 1.0,
        "_source" : {
          "price" : 2699,
          "title" : "vivo手机"
        }
      }
    ]
  }
}

3. 聚合（aggregations）

聚合可以让我们极其方便的实现对数据的统计、分析。例如：

什么品牌的手机最受欢迎？
这些手机的平均价格、最高价格、最低价格？
这些手机每月的销售情况如何？

实现这些统计功能的比数据库的sql要方便的多，而且查询速度非常快，可以实现实时搜索效果。

3.1. 基本概念

Elasticsearch中的聚合，包含多种类型，最常用的两种，一个叫桶，一个叫度量：

桶（bucket）

桶的作用，是按照某种方式对数据进行分组，每一组数据在ES中称为一个桶，例如我们根据国籍对人划分，可以得到中国桶、英国桶，日本桶……或者我们按照年龄段对人进行划分：010,1020,2030,3040等。

Elasticsearch中提供的划分桶的方式有很多：

Date Histogram Aggregation：根据日期阶梯分组，例如给定阶梯为周，会自动每周分为一组
Histogram Aggregation：根据数值阶梯分组，与日期类似
Terms Aggregation：根据词条内容分组，词条内容完全匹配的为一组
Range Aggregation：数值和日期的范围分组，指定开始和结束，然后按段分组
……

bucket aggregations 只负责对数据进行分组，并不进行计算，因此往往bucket中往往会嵌套另一种聚合：metrics aggregations即度量

度量（metrics）

分组完成以后，我们一般会对组中的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量

比较常用的一些度量聚合方式：

Avg Aggregation：求平均值
Max Aggregation：求最大值
Min Aggregation：求最小值
Percentiles Aggregation：求百分比
Stats Aggregation：同时返回avg、max、min、sum、count等
Sum Aggregation：求和
Top hits Aggregation：求前几
Value Count Aggregation：求总数
……

3.2. 聚合为桶

首先，我们按照手机的品牌attr.brand.keyword来划分桶

GET /atguigu/_search
{
    "size" : 0,
    "aggs" : { 
        "brands" : { 
            "terms" : { 
              "field" : "attr.brand.keyword"
            }
        }
    }
}

size：查询条数，这里设置为0，因为我们不关心搜索到的数据，只关心聚合结果，提高效率
aggs：声明这是一个聚合查询，是aggregations的缩写
- brands：给这次聚合起一个名字，任意。
  - terms：划分桶的方式，这里是根据词条划分
    - field：划分桶的字段

结果：

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "brands" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "华为",
          "doc_count" : 4
        },
        {
          "key" : "小米",
          "doc_count" : 4
        },
        {
          "key" : "oppo",
          "doc_count" : 1
        },
        {
          "key" : "vivo",
          "doc_count" : 1
        }
      ]
    }
  }
}

hits：查询结果为空，因为我们设置了size为0
aggregations：聚合的结果
brands：我们定义的聚合名称
buckets：查找到的桶，每个不同的品牌字段值都会形成一个桶
- key：这个桶对应的品牌字段的值
- doc_count：这个桶中的文档数量

3.3. 桶内度量

前面的例子告诉我们每个桶里面的文档数量，这很有用。但通常，我们的应用需要提供更复杂的文档度量。例如，每种品牌手机的平均价格是多少？

因此，我们需要告诉Elasticsearch使用哪个字段，使用何种度量方式进行运算，这些信息要嵌套在桶内，度量的运算会基于桶内的文档进行

现在，我们为刚刚的聚合结果添加求价格平均值的度量：

GET /atguigu/_search
{
    "size" : 0,
    "aggs" : { 
        "brands" : { 
            "terms" : { 
              "field" : "attr.brand.keyword"
            },
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                }
            }
        }
    }
}

aggs：我们在上一个aggs(brands)中添加新的aggs。可见度量也是一个聚合
avg_price：聚合的名称
avg：度量的类型，这里是求平均值
field：度量运算的字段

结果：

{
  "took" : 82,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "brands" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "华为",
          "doc_count" : 4,
          "avg_price" : {
            "value" : 3999.0
          }
        },
        {
          "key" : "小米",
          "doc_count" : 4,
          "avg_price" : {
            "value" : 3499.0
          }
        },
        {
          "key" : "oppo",
          "doc_count" : 1,
          "avg_price" : {
            "value" : 2799.0
          }
        },
        {
          "key" : "vivo",
          "doc_count" : 1,
          "avg_price" : {
            "value" : 2699.0
          }
        }
      ]
    }
  }
}

可以看到每个桶中都有自己的avg_price字段，这是度量聚合的结果

3.4. 桶内嵌套桶

刚刚的案例中，我们在桶内嵌套度量运算。事实上桶不仅可以嵌套运算，还可以再嵌套其它桶。也就是说在每个分组中，再分更多组。

比如：我们想统计每个品牌都生产了那些产品，按照attr.category.keyword字段再进行分桶

GET /atguigu/_search
{
    "size" : 0,
    "aggs" : { 
        "brands" : { 
            "terms" : { 
              "field" : "attr.brand.keyword"
            },
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                },
                "categorys": {
                  "terms": {
                    "field": "attr.category.keyword"
                  }
                }
            }
        }
    }
}

部分结果：

{
  "took" : 27,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "brands" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "华为",
          "doc_count" : 4,
          "categorys" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "手机",
                "doc_count" : 3
              },
              {
                "key" : "笔记本",
                "doc_count" : 1
              }
            ]
          },
          "avg_price" : {
            "value" : 3999.0
          }
        },
        {
          "key" : "小米",
          "doc_count" : 4,
          "categorys" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "手机",
                "doc_count" : 2
              },
              {
                "key" : "电视",
                "doc_count" : 1
              },
              {
                "key" : "笔记本",
                "doc_count" : 1
              }
            ]
          },
          "avg_price" : {
            "value" : 3499.0
          }
        },
        {
          "key" : "oppo",
          "doc_count" : 1,
          "categorys" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "手机",
                "doc_count" : 1
              }
            ]
          },
          "avg_price" : {
            "value" : 2799.0
          }
        },
        {
          "key" : "vivo",
          "doc_count" : 1,
          "categorys" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "手机",
                "doc_count" : 1
              }
            ]
          },
          "avg_price" : {
            "value" : 2699.0
          }
        }
      ]
    }
  }
}

我们可以看到，新的聚合categorys被嵌套在原来每一个brands的桶中。
每个品牌下面都根据 attr.category.keyword字段进行了分组
我们能读取到的信息：
- 华为有4中产品
- 华为产品的平均售价是 3999.0美元。
- 其中3种手机产品，1种笔记本产品

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5653567.html

elasticsearch基本 *** 作

发表评论

评论列表（0条）