ElasticSearch学习笔记_随笔

ElasticSearch学习笔记

elasticsearch-head插件

全文检索属于常见的需求，开源的elasticsearch是目前全文搜索引擎的首选，它可以快速的存储、搜索和分析海量数据。维基百科、starkoverflow、github都在用它，ES的底层是开源库Lucene，但是没法直接用，必须自己写代码调用它的接口，ES是Lucene的封装，提供了Rest API *** 作接口，开箱即用

官方文档：Elasticsearch Guide [7.16] | Elastic

中文：Elasticsearch简介 · Elasticsearch7.6中文文档 · 看云

01 基本概念

Index（索引）：类似于数据库的insert，databaseType（类型）：类似于数据库的table，数据存在某个索引的某个类型下，相当于数据库的某个表下，7.x之后开始废弃，相同索引不同类型的同一字段在底层是同样处理的，不同类型的同一字段的搜索会冲突，为了提高Lucene的效率document（文档）：保存在某个索引下，某种类型的一个数据，json格式es的倒排索引机制

02 安装

windows安装elasticsearch7.8.0，kibana7.8.0_Bingmous的博客-CSDN博客

03 初步检索

对es的所有 *** 作，es都封装成了REST api，只需要发送请求就行

_cat

GET /_cat/nodes     # 查看所有节点
GET /_cat/health    # 查看es健康状况
GET /_cat/master    # 查看主节点
GET /_cat/indices   # 查看所有索引
GET /_cat/indices?v # 带参数v, 更详细,显示title

索引一个文档（保存）

# 索引一个文档：一般使用PUT修改数据 不指定id会报错
PUT /customer/_doc/1    # 重复发送会增加版本号, PUT POST都可以，POST可以不带id，自动生成
{
  "name":"john"
}
# 使用_create创建
PUT /customer/_create/6    # 只能创建一次，PUT POST都可以

查询文档

GET /customer/_doc/6
# 结果：带_的都是元信息
{
  "_index" : "customer",    # 在哪个索引
  "_type" : "_doc",         # 在哪个类型
  "_id" : "6",              # id
  "_version" : 1,           # 版本号
  "_seq_no" : 27,           # 序列号,用于并发控制，每次更新就会加1，用来做乐观锁
  "_primary_term" : 1,      # 同上，主分片重新分配，如重启就会变化
  "found" : true,
  "_source" : {            # 实际存储的数据
    "name" : "john"
  }
}

# 乐观锁修改文档，更新时携带?if_seq_no=0&if_primary_term=1 当满足条件时修改数据，否则不修改
PUT /customer/_doc/6?if_seq_no=27&if_primary_term=1
{
  "name":"john"
}

更新文档

# 更新一个文档，POST对比原数据，没有变化则不更新，noop
POST customer/_update/6/
{
  "doc":{        # doc表示文档
    "name":"Jane",
    "age":18
  }
}
# 或者直接索引一个新的，会直接覆盖
# 总结：大并发偶尔更新的使用_update更新，重新计算分配规则，大并发更新较多的直接覆盖

删除文档、索引

DELETE customer/_doc/5    # 删除一个文档
DELETE customer           # 删除整个索引

bulk批量API

# 语法格式
{action:{metadata}}    # action可以是index delete create update
{requestbody}

PUT /bank/_bulk                        # POST也可以
{"index":{"_id":"1"}}                  #  *** 作
{"account_number":1,"balance":39225}   # 数据

PUT /customer/_bulk
{"index":{"_id":"1"}}    # index是新增，下面一行是数据，两行是一个整体
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"Jane"}

# 复杂实例，对整个ES *** 作
PUT /_bulk
{"delete":{"_index":"webset","_type":"blog","_id":"123"}}
{"create":{"_index":"webset","_type":"blog","_id":"123"}}
{"title":"My first blog post"}
{"index":{"_index":"webset","_type":"blog"}}
{"title":"My second blog post"}
{"update":{"_index":"webset","_type":"blog","_id":"123"}}
{"doc":{"title":"My updateed blok post"}}

# 官方数据批量导入
https://github.com/elastic/elasticsearch/blob/7.4/docs/src/test/resources/accounts.json
PUT /bank/acount/_bulk

04 进阶检索 SearchAPI

ES支持的两种基本方式

通过REST request URI 发送搜索参数（URI + 请求参数）通过REST request body 来发送（URI + 请求体）

# 通过request uri检索
GET bank/_search?q=*&sort=account_number:desc

# 通过request body检索，请求体为Query DSL
GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": {    # 也可以简写为"account_number": "desc"
        "order": "desc"
      },
    }
  ]
}
# 默认返回10条数据，在hits.hits数组的_source里

Query DSL

基本语法

# ES提供的可以查询的json风格的DSL(Domain-specific language),称为Query DSL.该查询语言非常全面，并且刚开始的时候有些复杂，真正学好它的方法是从一些基础示例开始的。
# 典型结构
{
    query_name:{    # 根 *** 作，要做什么，对于某个字段也是同样的结构
        argument:value,
        argument:value,...
    }
}

返回部分字段

GET bank/_search
query # 定义如何查询
    match_all       # 代表查询所有
    match           # 【模糊匹配】分词匹配，全文检索按照评分进行排序，keyword表示匹配字段整个内容，全文检索使用match，非text使用term，text类型才有keyword子属性
    term            # 【精确匹配】比如整数值，文本字段使用match，文本字段进行了分析，使用term检索非常困难
    match_phrase    # 【短语匹配】不进行分词，对整个短语进行匹配
    multi_match     # 【多个字段匹配】在某些字段里面匹配
    bool     # 【复合查询】，可以组合多个查询
        must     # 【必须满足】这几个查询都可以使用range 指定范围，贡献相关性得分
        must_not # 【必须不满足】
        should   # 【应该满足】
        filter   # 【过滤】不贡献相关性得分
sort    # 【排序】，会在前字段相等时，后字段内部排序，跟数组
from    # 【从第几条开始】
size    # 【限定返回结果数量】完成分页，默认返回10，size 0不显示数据
_source # 【返回部分字段】
aggs    # 【对匹配结果聚合】
    terms    # 【字段频次】size频次结果的前几个
    avg      # 【字段平均值】

Mapping

字段类型，7.x之后移除了type的概念，1>两个不同type下的两个相同字段在ES同一个索引下是同一个field，必须在两个不同的type下定义相同的filed映射，否则不同type中的相同字段名称处理就会出现冲突的情况，导致lucene处理效率下降，2>去掉type就是为了提高ES处理数据的效率。

# 查看映射
GET bank/_mapping

# 创建映射 keyword不进行分词，text分词,index默认为true（是否可被检索
PUT my_index
{
  "mappings": {
    "properties": {
      "age":{"type": "integer"},
      "email":{"type": "keyword"},
      "name":{"type": "text","index": true}
    }
  }
}
# 添加映射
PUT my_index/_mapping
{
  "properties":{
    "employee_id": {
      "type":"long",
      "index":"false"
    }
  }
}
# 不能更改映射，只能将数据迁移至新索引
POST _reindex
{
  "source": {
    "index": "bank"
  },
  "dest": {
    "index": "newbank"
  }
}

分词

安装ik，下载https://github.com/medcl/elasticsearch-analysis-ik

# 直接解压在elasticsearch的plugins目录
# 查看
bin/elasticsearch-plugin.bat list

测试ik

# 分词 无法分割中文
POST _analyze
{
  "analyzer": "standard",
  "text": ["hello你好"]
}
# 使用ik分词器,观察两个效果
POST _analyze
{
  "analyzer": "ik_smart",
  "text": ["我是中国人"]
}
POST _analyze
{
  "analyzer": "ik_max_word",
  "text": ["我是中国人"]
}

自定以词库，通过ngix搭建自定义词库，nginx安装见附录

# 配置词库，在ik安装目录下的config/IKAnalyzer.cfg.xml，返回自定义词汇即可

05 ElasticSearch-Rest-Client

api参考Elasticsearch Clients | Elastic

导入依赖：查看导入各依赖版本，由于SpringBoot管理了es的版本，确定es和elasticsearch-rest-high-level-client版本一致



    4.0.0
    
        org.springframework.boot
        spring-boot-starter-parent
        2.3.4.RELEASE
         
    
    com.bingmous
    esRestClient
    0.0.1-SNAPSHOT
    esRestClient
    Demo project for Spring Boot
    
        11
        
        7.8.0
    
    
        
            org.springframework.boot
            spring-boot-starter-web
        
        
        
            org.elasticsearch.client
            elasticsearch-rest-high-level-client
            7.8.0
        

        
            org.springframework.boot
            spring-boot-starter-test
            test
            
                
                    org.junit.vintage
                    junit-vintage-engine
                
            
        
    

    
    
        
            
                org.springframework.boot
                spring-boot-maven-plugin

配置类：注册RestHighLevelClient到容器，配置请求的统一配置

@Configuration
public class ElasticSearchConfig {
    
    public static final RequestOptions COMMON_OPTIONS;
    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
//        builder.addHeader("Authorization", "Bearer " + TOKEN);
//        builder.setHttpAsyncResponseConsumerFactory(
//                new HttpAsyncResponseConsumerFactory
//                        .HeapBufferedResponseConsumerFactory(30 * 1024 * 1024 * 1024));
        COMMON_OPTIONS = builder.build();
    }

    
    @Bean
    public RestHighLevelClient esRestClient() {
        //1 先获取到一个RestClientBuilder
        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost",9200,"http"));
        //2 再用RestClientBuilder构建一个client
        RestHighLevelClient client = new RestHighLevelClient(builder);
        return client;
    }
}

主类

@SpringBootApplication
public class MainApplication {
    public static void main(String[] args) throws IOException {
        ConfigurableApplicationContext run = SpringApplication.run(MainApplication.class, args);
//        RestHighLevelClient esClient = run.getBean("esClient", RestHighLevelClient.class);
//        System.out.println(esClient);
    }
}

测试类

用Spring的驱动跑单元测试，junit4
//@RunWith(SpringRunner.class)
@SpringBootTest
public class EsRestClientApplicationTests {

    @Autowired
    private RestHighLevelClient esRestClient; //自动注入esRestClient

    //测试是否注入成功
    @Test
    public void contextLoads() {
        System.out.println(esRestClient);
    }

    //索引数据：保存、更新数据
    @Test
    public void indexData() throws IOException {
        IndexRequest indexRequest = new IndexRequest("users"); //1 索引
        indexRequest.id("1"); //2 数据的id
        // 第一种方式Map
//        indexRequest.source("userName","zhangsan","age",18);
        // 第二种方式json
        User user = new User("zhangsan","18");
        String jsonString = JSON.toJSonString(user);
        indexRequest.source(jsonString, XContentType.JSON); //3 要保存的内容

        //4 请求的响应
        IndexResponse index = esRestClient.index(indexRequest, ElasticSearchConfig.COMMON_OPTIONS);
        System.out.println(index);
    }

    //检索数据
    @Test
    public void searchData() throws IOException {
        //1 创建检索请求
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices("bank"); // 指定索引

        //2 指定DSL，检索条件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // 构建查询语句的，有所有根 *** 作
//        searchSourceBuilder.query(); //所有根 *** 作都在这里构建
//        searchSourceBuilder.from();
//        searchSourceBuilder.size();
//        searchSourceBuilder.sort();
        //2.1 query里面的match
        searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
        //2.2 query里面的聚合，按照年龄的值分布进行聚合
        TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms("ageAgg").field("age").size(10); //terms聚合
        searchSourceBuilder.aggregation(termsAggregationBuilder);
        //2.3 query里面的聚合，计算平均薪资
        AvgAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("balanceAvg").field("balance");
        searchSourceBuilder.aggregation(avgAggregationBuilder);

        searchRequest.source(searchSourceBuilder); //将检索条件放入searchRequest中

        //3 执行检索
        SearchResponse searchResponse = esRestClient.search(searchRequest, ElasticSearchConfig.COMMON_OPTIONS);

        //4 分析结果 searchResponse
//        System.out.println(searchResponse);
//        Map map = JSON.parseObject(searchResponse.toString(), Map.class);//将结果封装成Map
        //4.1 直接获取查到的数据
        SearchHits hits = searchResponse.getHits(); // 最外面的hits
        SearchHit[] searchHits = hits.getHits(); //实际的数据hits，里面的_source是原数据，通常封装成bean
        for (SearchHit hit : searchHits) {
//            hits" : [
//            {
//                "_index" : "bank",
//                    "_type" : "_doc",
//                    "_id" : "970",
//                    "_score" : 5.4032025,
//                    "_source" : {
//            hit.getIndex();hit.getType();hit.getScore(); //所有的字段都有对应的方法
            String string = hit.getSourceAsString(); //将数据作为字符串返回，生成对应的bean对象
//            System.out.println(string); //通过在线格式化工具自动生成实体类
            Account account = JSON.parseObject(string, Account.class); //转为对应的实体类
            System.out.println(account);
        }
        //4.2 获取这次检索的分析信息
        Aggregations aggregations = searchResponse.getAggregations(); //分析结果在聚合中
        //按年龄聚合结果
        Terms ageAgg = aggregations.get("ageAgg"); //根据检索时定义的聚合名字获取到聚合结果
        for (Terms.Bucket bucket : ageAgg.getBuckets()) { //分析结果数据在每个聚合的buckets里面
            String key = bucket.getKeyAsString();
            long docCount = bucket.getDocCount();
            System.out.println("key: " + key + ", doc_count: " + docCount);
        }
        //平均薪资聚合
        Avg balanceAvg = aggregations.get("balanceAvg"); //拿到聚合的平均值
        System.out.println("balanceAvg: " + balanceAvg.getValueAsString());
    }
}

测试代码对应的kibana测试

06 附录-安装nginx

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/zaji/5704830.html

ElasticSearch学习笔记

发表评论

评论列表（0条）