ElasticSearch分布式搜索引擎从入门到实战应用（实战篇-仿京东首页搜索商品高亮显示）_随笔

ElasticSearch分布式搜索引擎从入门到实战应用（实战篇-仿京东首页搜索商品高亮显示）

1、熟悉SpringBoot集成ElasticSearch
- 1.1、官方指导文档
- 1.2、创建集成项目配置
- 1.3、测试索引-增删查
- 1.4、测试文档-增删改查
2、ElasticSearch实战-仿京东首页查询高亮
- 2.1、创建项目
- 2.2、基础爬虫拉取数据（jsoup）
- 2.3、编写service业务逻辑层接口及实现类
- 2.4、编写Controller前端访问层
- 2.5、测试接口
- 2.6、前后端分离（简单使用Vue）
- 2.7、高亮显示关键字

1、熟悉SpringBoot集成ElasticSearch 1.1、官方指导文档

elasticsearch官方指导文档：https://www.elastic.co/guide/index.html

推荐使用REST风格 *** 作es，可以直接根据REST Client客户端官方指导文档即可：
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/index.html

1.2、创建集成项目配置

1、引入springboot集成es客户端依赖


    org.springframework.boot
    spring-boot-starter-data-elasticsearch

2、统一版本

	
	
        org.springframework.boot
        spring-boot-starter-parent
        2.2.5.RELEASE
         
    


	
    
        1.8
        7.6.1

3、导入后续会用到的关键依赖

		
        
            org.projectlombok
            lombok
            true
        
        
        
        
            com.alibaba
            fastjson
            1.2.70

4、创建并编写配置类

@Configuration
public class ElasticSearchRestClientConfig {
    // 向spring容器中注入Rest高级客户端
    //方法名最好和返回类型保持一直，后续自动匹配装载时方便
    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1",9200,"http"))
        );
        return client;
    }
}

5、创建并编写测试实体类

@Data //生成setter和getter方法
@NoArgsConstructor //生成无参构造函数
@AllArgsConstructor //生成带参构造函数
public class User implements Serializable {
    private String name;
    private Integer age;
}

1.3、测试索引-增删查

首先启动elasticsearch和es-head服务和插件
然后要启动项目的主启动类SpringbootElasticsearchApiApplication，因为要把RestHighLevelClient注入到spring容器中，在测试前一定一定要做这一步，后续的测试才不会报错，血的教训！！！
测试建议写在test包下的SpringbootElasticsearchApplicationTests类中

6.1、创建索引

@SpringBootTest
class SpringbootElasticsearchApplicationTests {

    @Autowired
    RestHighLevelClient restHighLevelClient;


    @Test
    public void testPUTCreateIndex() throws IOException {
        //创建索引请求对象，同时可初始化索引名
        CreateIndexRequest request = new CreateIndexRequest("yxj_index");
        //创建索引响应对应，默认类型
        CreateIndexResponse reponse = restHighLevelClient.indices().create(request,RequestOptions.DEFAULT);

        System.out.println(reponse.isAcknowledged());//根据响应状态，索引是够创建成功
        System.out.println(reponse);//查询响应对象信息
        restHighLevelClient.close();//用完一定要关闭客户端
    }

}

控制台结果：
true
org.elasticsearch.client.indices.CreateIndexResponse@5565235d

6.2、获取索引，并判断其是否存在

	@Test
    public void testGETIndexAndIsExists() throws IOException {
        //创建获取索引请求对象
        GetIndexRequest request = new GetIndexRequest("yxj_index");
        //创建获取索引响应对象
        GetIndexResponse response = restHighLevelClient.indices().get(request, RequestOptions.DEFAULT);
        //判断索引是否存在
        boolean exits = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);

        System.out.println(response.getIndices());//输出索引信息(暂时还没数据)
        System.out.println(exits);//判断是否存在
        restHighLevelClient.close();//用完一定要关闭客户端
    }


控制台结果：
[Ljava.lang.String;@36790bec
true

6.3、删除索引

	@Test
    public void testDeleteIndex() throws IOException {
        //创建删除索引的请求对象
        DeleteIndexRequest request = new DeleteIndexRequest("yxj_index");
        //创建删除索引的响应对象
        AcknowledgedResponse response = restHighLevelClient.indices().delete(request,RequestOptions.DEFAULT);

        System.out.println(response.isAcknowledged());//判断删除是否成功
        restHighLevelClient.close();
    }

控制台结果：
true

1.4、测试文档-增删改查

1、添加文档

	
    @Test
    void testAdddocument() throws IOException{
        //创建对象
        User user = new User("一宿君",21);
        //创建请求，链接索引库
        IndexRequest request = new IndexRequest("yxj_index");
        //规则  PUT /yxj_index/_doc/1
        request.id("1");
        request.timeout("1s");//设置超时时间为1s
        request.timeout(Timevalue.timevalueMinutes(1));//这两种方式应该都可以

        //将数据放入request请求中(json格式)
        request.source(JSON.toJSONString(user), XContentType.JSON);

        //客户端发送请求，获取响应的结果信息
        IndexResponse response = restHighLevelClient.index(request,RequestOptions.DEFAULT);

        System.out.println(response.status());//获取 *** 作文档的状态
        System.out.println(response);//获取文档 *** 作相应信息
        restHighLevelClient.close();
    }



控制台结果：
CREATED
IndexResponse[index=yxj_index,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]

2、获取文档信息

    @Test
    void testGetDocumntAndIsExits() throws IOException {
        //创建获取文档请求，指定索引名和文档id
        GetRequest request = new GetRequest("yxj_index","1");
        //过滤掉_source文档上下文，我们只需要判断文档是否存在，不需要获取内容，可以提高效率
        //request.fetchSourceContext(new FetchSourceContext(false));
        //不获取任何字段
        //request.storedFields("_none_");

        //获取值钱，先判断该文档是否存在（提高效率）
        boolean exists = restHighLevelClient.exists(request, RequestOptions.DEFAULT);

        if(exists){
            System.out.println("文档存在。。。");
            //发送请求获取响应对象(此处发送请求，如果使用上述的request过滤掉上下文，是获取不到内容的，可以把上述过滤注释掉)
            GetResponse response = restHighLevelClient.get(request,RequestOptions.DEFAULT);

            System.out.println(response.getSourceAsString());//获取文档全部内容，转换为字符串
            System.out.println(response);//获取全部相应信息（和Kibana的命令 *** 作是一致的）
        }else {
            System.out.println("文档不存在！！！");
        }

        restHighLevelClient.close();//关闭客户端
    }


控制台结果：
文档存在。。。
{"age":21,"name":"一宿君"}
{"_index":"yxj_index","_type":"_doc","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"age":21,"name":"一宿君"}}

3、文档更新

    @Test
    void testUpdatedocument() throws IOException {
        //创建更新请求
        UpdateRequest request = new UpdateRequest("yxj_index","1");
        //创建更新数据
        User user = new User("一宿君Java",19);
        //将数据放入请求中，转换为JSON格式
        request.doc(JSON.toJSONString(user),XContentType.JSON);
        //发送请求
        UpdateResponse response = restHighLevelClient.update(request, RequestOptions.DEFAULT);

        System.out.println(response.status());//查询更新状态是否成功
        restHighLevelClient.close();//关闭客户端
    }

控制台结果：
OK

4、文档的删除

	
    @Test
    void testDeletedocument() throws IOException {
        //创建删除请求
        DeleteRequest request = new DeleteRequest("yxj_index", "1");

        //发送请求
        DeleteResponse response = restHighLevelClient.delete(request, RequestOptions.DEFAULT);

        System.out.println(response.status());//查询更新状态是否成功
        restHighLevelClient.close();//关闭客户端
    }

控制台结果：
OK

5、批量插入文档数据

	
    @Test
    void testBulkInsertdocument() throws IOException {
        //创建批量出入请求对象
        BulkRequest request = new BulkRequest();
        request.timeout("1s");

        //创建集合文档数据
        List userList = new ArrayList<>();
        userList.add(new User("一宿君1", 1));
        userList.add(new User("一宿君2", 2));
        userList.add(new User("一宿君3", 3));
        userList.add(new User("一宿君4", 4));
        userList.add(new User("一宿君5", 5));
        userList.add(new User("一宿君6", 6));

        //批量请求处理
        for(int i=0;i 

 6、文档带条件查询 
	
	@Test
    void testHasConditionSearch() throws IOException {
        //创建查询条件请求对象
        SearchRequest request = new SearchRequest();
        //构建查询条件对象
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name","一宿君");
        //TermQueryBuilder queryBuilder = QueryBuilders.termQuery("name","一宿君");

        //将查询条件对象放入 请求构建查询条件对象中
        searchSourceBuilder.query(matchQueryBuilder);
        //设置高亮
        searchSourceBuilder.highlighter(new HighlightBuilder());
        //设置分页（当前第0页，每页显示3条数据）
        searchSourceBuilder.from(0);
        searchSourceBuilder.size(3);

        //将构建查询条件对象放入到请求查询条件对象中
        request.source(searchSourceBuilder);

        //此处是指定索引，如果不指定会遍历所有的索引
        request.indices("bulk_index");

        //客户单发送请求
        SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);

        System.out.println(response.status());//查看查询的状态
        System.out.println(response);//打印全部响应信息

        //获取查询结果集，并遍历
        SearchHits hits = response.getHits();//此处获取到的是整个hits标签，包含全部信息
        System.out.println(JSON.toJSONString(hits));//将结果集转换为JSON格式
        System.out.println("============================================================");

        //此处的hits内部才是包含数据
        for(SearchHit documentFields:hits.getHits()){
            System.out.println(documentFields.getSourceAsString());//这个是获取字符串格式
            //System.out.println(documentFields.getSourceAsMap());//这个是获取map集合对格式
        }

        restHighLevelClient.close();//关闭客户端
    }

控制台结果：
OK
{"took":19,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":6,"relation":"eq"},"max_score":0.22232392,"hits":[{"_index":"bulk_index","_type":"_doc","_id":"1","_score":0.22232392,"_source":{"age":1,"name":"一宿君1"}},{"_index":"bulk_index","_type":"_doc","_id":"2","_score":0.22232392,"_source":{"age":2,"name":"一宿君2"}},{"_index":"bulk_index","_type":"_doc","_id":"3","_score":0.22232392,"_source":{"age":3,"name":"一宿君3"}}]}}
{"fragment":true,"hits":[{"fields":{},"fragment":false,"highlightFields":{},"id":"1","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.22232392,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"一宿君1","age":1},"sourceAsString":"{"age":1,"name":"一宿君1"}","sourceRef":{"fragment":true},"type":"_doc","version":-1},{"fields":{},"fragment":false,"highlightFields":{},"id":"2","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.22232392,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"一宿君2","age":2},"sourceAsString":"{"age":2,"name":"一宿君2"}","sourceRef":{"fragment":true},"type":"_doc","version":-1},{"fields":{},"fragment":false,"highlightFields":{},"id":"3","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.22232392,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"一宿君3","age":3},"sourceAsString":"{"age":3,"name":"一宿君3"}","sourceRef":{"fragment":true},"type":"_doc","version":-1}],"maxScore":0.22232392,"totalHits":{"relation":"EQUAL_TO","value":6}}
============================================================
{"age":1,"name":"一宿君1"}
{"age":2,"name":"一宿君2"}
{"age":3,"name":"一宿君3"}
 
 
2、ElasticSearch实战-仿京东首页查询高亮 
2.1、创建项目 
 
静态界面资源包： 
 
 链接：https://pan.baidu.com/s/1L8_NtjVLMmOooK2m-L0Tlw
 提取码：9gjc 
 

 配置application.properties配置文件： 
#修改端口号
server.port=9090
#关闭thymeleaf缓存
spring.thymeleaf.cache=false
 
导入相关依赖（特别注意版本号）： 
	
        org.springframework.boot
        spring-boot-starter-parent
        2.2.5.RELEASE
         
    
    com.wbs
    springboot-elasticsearch-jd
    0.0.1-SNAPSHOT
    springboot-elasticsearch-jd

    
    
        1.8
        7.6.1
    


    
        
        
            org.springframework.boot
            spring-boot-starter-data-elasticsearch
        
        
        
            org.springframework.boot
            spring-boot-starter-thymeleaf
        
        
            org.springframework.boot
            spring-boot-starter-web
        
        
        
            com.alibaba
            fastjson
            1.2.70
        

        
        
            org.springframework.boot
            spring-boot-devtools
            runtime
            true
        

        
            org.springframework.boot
            spring-boot-configuration-processor
            true
        
        
        
            org.projectlombok
            lombok
            true
        
        
            org.springframework.boot
            spring-boot-starter-test
            test
        
    
 
编写IndexController层： 
@Controller
public class IndexController {

    @RequestMapping({"/","/index"})
    public String toIndex(){
        return "index";
    }
}
 
启动项目，直接访问地址localhost:9090，首先保证我们的项目能正常启动和访问到首页：
  
2.2、基础爬虫拉取数据（jsoup） 
数据获取的方式有很多种： 
数据库
消息队列
缓存
爬虫
等等…… 
1、首先导入jsoup依赖包 
		
        
        
            org.jsoup
            jsoup
            1.10.2
        
 
2、进入京东首页搜索商品关键字 
查看地址栏地址： 
https://search.jd.com/Search?keyword=Java&enc=utf-8
 

 3、审查网页元素
 
 4、编写工具类爬取数据（获取请求返回的页面信息，筛选出可用的） 
public class HtmlParseUtilTest {
    public static void main(String[] args) throws IOException {
        //1、请求url
        String url = "https://search.jd.com/Search?keyword=Java&enc=utf-8";
        //2、解析网页（jsoup解析返回的就是浏览器document对象，可以 *** 作网页中所有的html元素）
        document document = Jsoup.parse(new URL(url), 30000);

        //3、通过上述审查网页元素中的商品列表id，获取元素
        Element element = document.getElementById("J_goodsList");

        //4、获取element元素中ul下的每一个所有li元素
        Elements elements = element.getElementsByTag("li");

        //5、获取li元素下的商品属性：img、price、name、……
        for (Element el : elements) {
            System.out.println("img-src:" + el.getElementsByTag("img").eq(0).attr("src"));//获取li元素下的第一章照片
            System.out.println("name:" + el.getElementsByClass("p-name").eq(0).text());//获取商品名字
            System.out.println("price:" + el.getElementsByClass("p-price").eq(0).text());//获取商品价格
            System.out.println("shopname:" + el.getElementsByClass("hd-shopname").eq(0).text());//获取商品出版社
            System.out.println("================================================================================================");
        }
        
    }
}
 
 
 
 上述的情况是以为大型网站图片比较多，一般使用的都是图片延迟加载（也就是懒加载的方式）渲染图片，这样可以高相应速度。
 
  
 
更改图片获取属性为 ：data-lazy-img
 
 5、编写实体类，存放商品属性信息 
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Product implements Serializable {
    private String name;
    private String img;
    private String price;
    private String shopname;
    
    //……属性可以根据需要添加，这里只罗列几个关键属性即可
}
 
6、编写修改解析网页工具类，获取树 
public class HtmlParseUtil {
    public static void main(String[] args) throws IOException {
        new HtmlParseUtil().parseJD("Java").forEach(System.out::println);
    }

    public List parseJD(String keyword) throws IOException {
        //1、请求url
        String url = "https://search.jd.com/Search?keyword=" + keyword +"&enc=utf-8";
        //2、解析网页（jsoup解析返回的就是浏览器document对象，可以 *** 作网页中所有的html元素）
        document document = Jsoup.parse(new URL(url), 30000);

        //3、通过上述审查网页元素中的商品列表id，获取元素
        Element element = document.getElementById("J_goodsList");

        //4、获取element元素中ul下的每一个所有li元素
        Elements elements = element.getElementsByTag("li");

        //5、创建存储数据集合
        ArrayList productArrayList = new ArrayList<>();

        //6、获取li元素下的商品属性：img、price、name、shopname，并添加到集合中
        for (Element el : elements) {
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");//获取li元素下的第一章照片
            String name = el.getElementsByClass("p-name").eq(0).text();//获取商品名字
            String price = el.getElementsByClass("p-price").eq(0).text();//获取商品价格
            String shopname = el.getElementsByClass("hd-shopname").eq(0).text();//获取商品出版社

            //创建商品实体类
            Product product  = new Product(img,name,price,shopname);
            //添加到集合中
            productArrayList.add(product);
        }
        //返回集合
        return productArrayList;
    }

}

 
注意：
  
执行查看结果：
  
2.3、编写service业务逻辑层接口及实现类 
//接口
@Service
public interface ProductService {


    //爬取数据存入es中
    public Boolean parseProductSafeEs(String keyword) throws IOException;

    //分页查询
    public List> searchProduct(String keyword, int pageNum, int pageSize) throws IOException;

}




//实现类
@Service
public class ProductServiceImpl implements ProductService {

    @Autowired
    RestHighLevelClient restHighLevelClient;

    @Override
    public Boolean parseProductSafeEs(String keyword) throws IOException {
        //获取数据集
        List productList = new HtmlParseUtil().parseJD(keyword);
        //创建批处理请求对象
        BulkRequest request = new BulkRequest();
        //设置超时时间为
        request.timeout("2s");

        //将批量数据存入es中
        for (int i = 0; i < productList.size(); i++) {
            request.add(
                    new IndexRequest("jd_pro_index")
                            .id("" + (i+1))
                            .source(JSON.toJSONString(productList.get(i)), XContentType.JSON)
            );
        }

        //提交请求
        BulkResponse response = restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
        boolean bool = response.hasFailures();//是否是失败，true代表失败，false代表成功
        //restHighLevelClient.close();
        return !bool;
    }

    @Override
    public List> searchProduct(String keyword, int pageNum, int pageSize) throws IOException {
        if(pageNum < 0){
            pageNum  = 0;
        }

        //创建查询请求对象
        SearchRequest request = new SearchRequest("jd_pro_index");

        //构建查询条件对象
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //构建查询条件(采用精确查询，根据keyword关键字查询name字段)
        //TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", keyword);

        //将查询条件放入到构建查询条件对象中
        searchSourceBuilder.query(matchQueryBuilder);
        //设置超时时间为60S
        searchSourceBuilder.timeout(new Timevalue(60, TimeUnit.SECONDS));
        //分页
        searchSourceBuilder.from(pageNum);//当前页
        searchSourceBuilder.size(pageSize);//每页显示数量

        //将搜索条件源放入到搜索请求中
        request.source(searchSourceBuilder);

        //客户端发送请求,获取响应结果
        SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);

        //关闭客户端
        //restHighLevelClient.close();

        //解析获取结数据集
        SearchHits searchHits = response.getHits();

        //创建list集合
        List> mapList = new ArrayList<>();

        //循环遍历数据集，封装到list集合中
        for (SearchHit hit : searchHits.getHits()) {
            Map hitMap = hit.getSourceAsMap();
            mapList.add(hitMap);
        }

        return mapList;
    }
}
 
2.4、编写Controller前端访问层 
 
 注意：此处所有的方法都不要关闭RestHighLevelClient客户端，否则其他方法会无法继续访问，同时报IO异常。 
 
@Controller
public class ProductController {

    @Autowired
    RestHighLevelClient restHighLevelClient;

    @Autowired
    ProductService productService;

    
    @RequestMapping("/createIndex")
    @ResponseBody
    public String creatIndex() throws IOException {
        CreateIndexRequest request = new CreateIndexRequest("jd_pro_index");
        CreateIndexResponse response = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);

        System.out.println(response.isAcknowledged());

        if(response.isAcknowledged()){
            return "创建成功!";
        }else{
            return "创建失败!";
        }

    }

    
    @RequestMapping("/deleteIndex")
    @ResponseBody
    public String deleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("jd_pro_index");
        AcknowledgedResponse response = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);

        System.out.println(response.isAcknowledged());


        if(response.isAcknowledged()){
            return "删除成功!";
        }else{
            return "删除失败!";
        }

    }


    
    @RequestMapping("/toSafeEs/{keyword}")
    @ResponseBody
    public String parseProductSafeEs(@PathVariable("keyword") String keyword) throws IOException {
        if(productService.parseProductSafeEs(keyword)){
            return "爬取数据成功存入es中！";
        }
        return "爬取数据失败";
    }


    @RequestMapping("/searchEsDoc/{keyword}/{pageNum}/{pageSize}")
    @ResponseBody
    public List> searchProduct(
            @PathVariable("keyword") String keyword,
            @PathVariable("pageNum") int pageNum,
            @PathVariable("pageSize") int pageSize) throws IOException {
        List> mapList = productService.searchProduct(keyword, pageNum, pageSize);
        if (mapList != null){
            return mapList;
        }
        return null;
    }

}
 
2.5、测试接口 
创建索引
 
 爬取数据存入es中
 
  
查询数据：
  
2.6、前后端分离（简单使用Vue） 
下载vue依赖：用于渲染前端页面
下载axios依赖：用于ajax请求后端接口 
vue和axios都可以去官网下载，跟狂神学了一个小技巧，在本地新建一个英文目录文件夹，直接cmd进入该目录下（前提是安装了nodejs）： 
#如果之前没有初始化过，可以先执行初始化
npm init

#下载vue
npm install vue

#下载axios
npm install axios
 

 
 修改index.xml首页： 




    
    一宿君Java-ES仿京东实战
    
    




    

        
        
            
                
                    
                    
                        
                    

                    

                        
                        
                            
                                
                                    天猫搜索
                                    
                                        
                                            
                                                					
										


					
						欢迎分享，转载请注明来源：内存溢出
原文地址: https://outofmemory.cn/zaji/5618589.html

ElasticSearch分布式搜索引擎从入门到实战应用（实战篇-仿京东首页搜索商品高亮显示）

发表评论

评论列表（0条）