ElasticSearch分布式搜索引擎从入门到实战应用(实战篇-仿京东首页搜索商品高亮显示)

ElasticSearch分布式搜索引擎从入门到实战应用(实战篇-仿京东首页搜索商品高亮显示),第1张

ElasticSearch分布式搜索引擎从入门到实战应用(实战篇-仿京东首页搜索商品高亮显示)

ElasticSearch分布式搜索引擎从入门到实战应用(实战篇-仿京东首页搜索商品高亮显示)
  • 1、熟悉SpringBoot集成ElasticSearch
    • 1.1、官方指导文档
    • 1.2、创建集成项目配置
    • 1.3、测试索引-增删查
    • 1.4、测试文档-增删改查
  • 2、ElasticSearch实战-仿京东首页查询高亮
    • 2.1、创建项目
    • 2.2、基础爬虫拉取数据(jsoup)
    • 2.3、编写service业务逻辑层接口及实现类
    • 2.4、编写Controller前端访问层
    • 2.5、测试接口
    • 2.6、前后端分离(简单使用Vue)
    • 2.7、高亮显示关键字

1、熟悉SpringBoot集成ElasticSearch 1.1、官方指导文档

elasticsearch官方指导文档:https://www.elastic.co/guide/index.html

推荐使用REST风格 *** 作es,可以直接根据REST Client客户端官方指导文档即可:
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/index.html


1.2、创建集成项目配置


1、引入springboot集成es客户端依赖


    org.springframework.boot
    spring-boot-starter-data-elasticsearch

2、统一版本

	
	
        org.springframework.boot
        spring-boot-starter-parent
        2.2.5.RELEASE
         
    


	
    
        1.8
        7.6.1
    

3、导入后续会用到的关键依赖

		
        
            org.projectlombok
            lombok
            true
        
        
        
        
            com.alibaba
            fastjson
            1.2.70
        

4、创建并编写配置类

@Configuration
public class ElasticSearchRestClientConfig {
    // 向spring容器中注入Rest高级客户端
    //方法名最好和返回类型保持一直,后续自动匹配装载时方便
    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1",9200,"http"))
        );
        return client;
    }
}

5、创建并编写测试实体类

@Data //生成setter和getter方法
@NoArgsConstructor //生成无参构造函数
@AllArgsConstructor //生成带参构造函数
public class User implements Serializable {
    private String name;
    private Integer age;
}
1.3、测试索引-增删查
  • 首先启动elasticsearch和es-head服务和插件
  • 然后要启动项目的主启动类SpringbootElasticsearchApiApplication,因为要把RestHighLevelClient注入到spring容器中,在测试前一定一定要做这一步,后续的测试才不会报错,血的教训!!!
  • 测试建议写在test包下的SpringbootElasticsearchApplicationTests类中

6.1、创建索引

@SpringBootTest
class SpringbootElasticsearchApplicationTests {

    @Autowired
    RestHighLevelClient restHighLevelClient;


    @Test
    public void testPUTCreateIndex() throws IOException {
        //创建索引请求对象,同时可初始化索引名
        CreateIndexRequest request = new CreateIndexRequest("yxj_index");
        //创建索引响应对应,默认类型
        CreateIndexResponse reponse = restHighLevelClient.indices().create(request,RequestOptions.DEFAULT);

        System.out.println(reponse.isAcknowledged());//根据响应状态,索引是够创建成功
        System.out.println(reponse);//查询响应对象信息
        restHighLevelClient.close();//用完一定要关闭客户端
    }

}

控制台结果:
true
org.elasticsearch.client.indices.CreateIndexResponse@5565235d


6.2、获取索引,并判断其是否存在

	@Test
    public void testGETIndexAndIsExists() throws IOException {
        //创建获取索引请求对象
        GetIndexRequest request = new GetIndexRequest("yxj_index");
        //创建获取索引响应对象
        GetIndexResponse response = restHighLevelClient.indices().get(request, RequestOptions.DEFAULT);
        //判断索引是否存在
        boolean exits = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);

        System.out.println(response.getIndices());//输出索引信息(暂时还没数据)
        System.out.println(exits);//判断是否存在
        restHighLevelClient.close();//用完一定要关闭客户端
    }


控制台结果:
[Ljava.lang.String;@36790bec
true

6.3、删除索引

	@Test
    public void testDeleteIndex() throws IOException {
        //创建删除索引的请求对象
        DeleteIndexRequest request = new DeleteIndexRequest("yxj_index");
        //创建删除索引的响应对象
        AcknowledgedResponse response = restHighLevelClient.indices().delete(request,RequestOptions.DEFAULT);

        System.out.println(response.isAcknowledged());//判断删除是否成功
        restHighLevelClient.close();
    }

控制台结果:
true

1.4、测试文档-增删改查

1、添加文档

	
    @Test
    void testAdddocument() throws IOException{
        //创建对象
        User user = new User("一宿君",21);
        //创建请求,链接索引库
        IndexRequest request = new IndexRequest("yxj_index");
        //规则  PUT /yxj_index/_doc/1
        request.id("1");
        request.timeout("1s");//设置超时时间为1s
        request.timeout(Timevalue.timevalueMinutes(1));//这两种方式应该都可以

        //将数据放入request请求中(json格式)
        request.source(JSON.toJSONString(user), XContentType.JSON);

        //客户端发送请求,获取响应的结果信息
        IndexResponse response = restHighLevelClient.index(request,RequestOptions.DEFAULT);

        System.out.println(response.status());//获取 *** 作文档的状态
        System.out.println(response);//获取文档 *** 作相应信息
        restHighLevelClient.close();
    }



控制台结果:
CREATED
IndexResponse[index=yxj_index,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]


2、获取文档信息

    @Test
    void testGetDocumntAndIsExits() throws IOException {
        //创建获取文档请求,指定索引名和文档id
        GetRequest request = new GetRequest("yxj_index","1");
        //过滤掉_source文档上下文,我们只需要判断文档是否存在,不需要获取内容,可以提高效率
        //request.fetchSourceContext(new FetchSourceContext(false));
        //不获取任何字段
        //request.storedFields("_none_");

        //获取值钱,先判断该文档是否存在(提高效率)
        boolean exists = restHighLevelClient.exists(request, RequestOptions.DEFAULT);

        if(exists){
            System.out.println("文档存在。。。");
            //发送请求获取响应对象(此处发送请求,如果使用上述的request过滤掉上下文,是获取不到内容的,可以把上述过滤注释掉)
            GetResponse response = restHighLevelClient.get(request,RequestOptions.DEFAULT);

            System.out.println(response.getSourceAsString());//获取文档全部内容,转换为字符串
            System.out.println(response);//获取全部相应信息(和Kibana的命令 *** 作是一致的)
        }else {
            System.out.println("文档不存在!!!");
        }

        restHighLevelClient.close();//关闭客户端
    }


控制台结果:
文档存在。。。
{"age":21,"name":"一宿君"}
{"_index":"yxj_index","_type":"_doc","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"age":21,"name":"一宿君"}}

3、文档更新

    @Test
    void testUpdatedocument() throws IOException {
        //创建更新请求
        UpdateRequest request = new UpdateRequest("yxj_index","1");
        //创建更新数据
        User user = new User("一宿君Java",19);
        //将数据放入请求中,转换为JSON格式
        request.doc(JSON.toJSONString(user),XContentType.JSON);
        //发送请求
        UpdateResponse response = restHighLevelClient.update(request, RequestOptions.DEFAULT);

        System.out.println(response.status());//查询更新状态是否成功
        restHighLevelClient.close();//关闭客户端
    }

控制台结果:
OK


4、文档的删除

	
    @Test
    void testDeletedocument() throws IOException {
        //创建删除请求
        DeleteRequest request = new DeleteRequest("yxj_index", "1");

        //发送请求
        DeleteResponse response = restHighLevelClient.delete(request, RequestOptions.DEFAULT);

        System.out.println(response.status());//查询更新状态是否成功
        restHighLevelClient.close();//关闭客户端
    }

控制台结果:
OK


5、批量插入文档数据

	
    @Test
    void testBulkInsertdocument() throws IOException {
        //创建批量出入请求对象
        BulkRequest request = new BulkRequest();
        request.timeout("1s");

        //创建集合文档数据
        List userList = new ArrayList<>();
        userList.add(new User("一宿君1", 1));
        userList.add(new User("一宿君2", 2));
        userList.add(new User("一宿君3", 3));
        userList.add(new User("一宿君4", 4));
        userList.add(new User("一宿君5", 5));
        userList.add(new User("一宿君6", 6));

        //批量请求处理
        for(int i=0;i 


6、文档带条件查询

	
	@Test
    void testHasConditionSearch() throws IOException {
        //创建查询条件请求对象
        SearchRequest request = new SearchRequest();
        //构建查询条件对象
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name","一宿君");
        //TermQueryBuilder queryBuilder = QueryBuilders.termQuery("name","一宿君");

        //将查询条件对象放入 请求构建查询条件对象中
        searchSourceBuilder.query(matchQueryBuilder);
        //设置高亮
        searchSourceBuilder.highlighter(new HighlightBuilder());
        //设置分页(当前第0页,每页显示3条数据)
        searchSourceBuilder.from(0);
        searchSourceBuilder.size(3);

        //将构建查询条件对象放入到请求查询条件对象中
        request.source(searchSourceBuilder);

        //此处是指定索引,如果不指定会遍历所有的索引
        request.indices("bulk_index");

        //客户单发送请求
        SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);

        System.out.println(response.status());//查看查询的状态
        System.out.println(response);//打印全部响应信息

        //获取查询结果集,并遍历
        SearchHits hits = response.getHits();//此处获取到的是整个hits标签,包含全部信息
        System.out.println(JSON.toJSONString(hits));//将结果集转换为JSON格式
        System.out.println("============================================================");

        //此处的hits内部才是包含数据
        for(SearchHit documentFields:hits.getHits()){
            System.out.println(documentFields.getSourceAsString());//这个是获取字符串格式
            //System.out.println(documentFields.getSourceAsMap());//这个是获取map集合对格式
        }

        restHighLevelClient.close();//关闭客户端
    }

控制台结果:
OK
{"took":19,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":6,"relation":"eq"},"max_score":0.22232392,"hits":[{"_index":"bulk_index","_type":"_doc","_id":"1","_score":0.22232392,"_source":{"age":1,"name":"一宿君1"}},{"_index":"bulk_index","_type":"_doc","_id":"2","_score":0.22232392,"_source":{"age":2,"name":"一宿君2"}},{"_index":"bulk_index","_type":"_doc","_id":"3","_score":0.22232392,"_source":{"age":3,"name":"一宿君3"}}]}}
{"fragment":true,"hits":[{"fields":{},"fragment":false,"highlightFields":{},"id":"1","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.22232392,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"一宿君1","age":1},"sourceAsString":"{"age":1,"name":"一宿君1"}","sourceRef":{"fragment":true},"type":"_doc","version":-1},{"fields":{},"fragment":false,"highlightFields":{},"id":"2","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.22232392,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"一宿君2","age":2},"sourceAsString":"{"age":2,"name":"一宿君2"}","sourceRef":{"fragment":true},"type":"_doc","version":-1},{"fields":{},"fragment":false,"highlightFields":{},"id":"3","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.22232392,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"一宿君3","age":3},"sourceAsString":"{"age":3,"name":"一宿君3"}","sourceRef":{"fragment":true},"type":"_doc","version":-1}],"maxScore":0.22232392,"totalHits":{"relation":"EQUAL_TO","value":6}}
============================================================
{"age":1,"name":"一宿君1"}
{"age":2,"name":"一宿君2"}
{"age":3,"name":"一宿君3"}

2、ElasticSearch实战-仿京东首页查询高亮 2.1、创建项目

静态界面资源包:

链接:https://pan.baidu.com/s/1L8_NtjVLMmOooK2m-L0Tlw
提取码:9gjc


配置application.properties配置文件:

#修改端口号
server.port=9090
#关闭thymeleaf缓存
spring.thymeleaf.cache=false

导入相关依赖(特别注意版本号):

	
        org.springframework.boot
        spring-boot-starter-parent
        2.2.5.RELEASE
         
    
    com.wbs
    springboot-elasticsearch-jd
    0.0.1-SNAPSHOT
    springboot-elasticsearch-jd

    
    
        1.8
        7.6.1
    


    
        
        
            org.springframework.boot
            spring-boot-starter-data-elasticsearch
        
        
        
            org.springframework.boot
            spring-boot-starter-thymeleaf
        
        
            org.springframework.boot
            spring-boot-starter-web
        
        
        
            com.alibaba
            fastjson
            1.2.70
        

        
        
            org.springframework.boot
            spring-boot-devtools
            runtime
            true
        

        
            org.springframework.boot
            spring-boot-configuration-processor
            true
        
        
        
            org.projectlombok
            lombok
            true
        
        
            org.springframework.boot
            spring-boot-starter-test
            test
        
    

编写IndexController层:

@Controller
public class IndexController {

    @RequestMapping({"/","/index"})
    public String toIndex(){
        return "index";
    }
}

启动项目,直接访问地址localhost:9090,首先保证我们的项目能正常启动和访问到首页:

2.2、基础爬虫拉取数据(jsoup)

数据获取的方式有很多种:

  • 数据库
  • 消息队列
  • 缓存
  • 爬虫
  • 等等……

1、首先导入jsoup依赖包

		
        
        
            org.jsoup
            jsoup
            1.10.2
        

2、进入京东首页搜索商品关键字

查看地址栏地址:

https://search.jd.com/Search?keyword=Java&enc=utf-8


3、审查网页元素

4、编写工具类爬取数据(获取请求返回的页面信息,筛选出可用的)

public class HtmlParseUtilTest {
    public static void main(String[] args) throws IOException {
        //1、请求url
        String url = "https://search.jd.com/Search?keyword=Java&enc=utf-8";
        //2、解析网页(jsoup解析返回的就是浏览器document对象,可以 *** 作网页中所有的html元素)
        document document = Jsoup.parse(new URL(url), 30000);

        //3、通过上述审查网页元素中的商品列表id,获取元素
        Element element = document.getElementById("J_goodsList");

        //4、获取element元素中ul下的每一个所有li元素
        Elements elements = element.getElementsByTag("li");

        //5、获取li元素下的商品属性:img、price、name、……
        for (Element el : elements) {
            System.out.println("img-src:" + el.getElementsByTag("img").eq(0).attr("src"));//获取li元素下的第一章照片
            System.out.println("name:" + el.getElementsByClass("p-name").eq(0).text());//获取商品名字
            System.out.println("price:" + el.getElementsByClass("p-price").eq(0).text());//获取商品价格
            System.out.println("shopname:" + el.getElementsByClass("hd-shopname").eq(0).text());//获取商品出版社
            System.out.println("================================================================================================");
        }
        
    }
}

上述的情况是以为大型网站图片比较多,一般使用的都是图片延迟加载(也就是懒加载的方式)渲染图片,这样可以高相应速度。

更改图片获取属性为 :data-lazy-img

5、编写实体类,存放商品属性信息

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Product implements Serializable {
    private String name;
    private String img;
    private String price;
    private String shopname;
    
    //……属性可以根据需要添加,这里只罗列几个关键属性即可
}

6、编写修改解析网页工具类,获取树

public class HtmlParseUtil {
    public static void main(String[] args) throws IOException {
        new HtmlParseUtil().parseJD("Java").forEach(System.out::println);
    }

    public List parseJD(String keyword) throws IOException {
        //1、请求url
        String url = "https://search.jd.com/Search?keyword=" + keyword +"&enc=utf-8";
        //2、解析网页(jsoup解析返回的就是浏览器document对象,可以 *** 作网页中所有的html元素)
        document document = Jsoup.parse(new URL(url), 30000);

        //3、通过上述审查网页元素中的商品列表id,获取元素
        Element element = document.getElementById("J_goodsList");

        //4、获取element元素中ul下的每一个所有li元素
        Elements elements = element.getElementsByTag("li");

        //5、创建存储数据集合
        ArrayList productArrayList = new ArrayList<>();

        //6、获取li元素下的商品属性:img、price、name、shopname,并添加到集合中
        for (Element el : elements) {
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");//获取li元素下的第一章照片
            String name = el.getElementsByClass("p-name").eq(0).text();//获取商品名字
            String price = el.getElementsByClass("p-price").eq(0).text();//获取商品价格
            String shopname = el.getElementsByClass("hd-shopname").eq(0).text();//获取商品出版社

            //创建商品实体类
            Product product  = new Product(img,name,price,shopname);
            //添加到集合中
            productArrayList.add(product);
        }
        //返回集合
        return productArrayList;
    }

}

注意:

执行查看结果:

2.3、编写service业务逻辑层接口及实现类
//接口
@Service
public interface ProductService {


    //爬取数据存入es中
    public Boolean parseProductSafeEs(String keyword) throws IOException;

    //分页查询
    public List> searchProduct(String keyword, int pageNum, int pageSize) throws IOException;

}




//实现类
@Service
public class ProductServiceImpl implements ProductService {

    @Autowired
    RestHighLevelClient restHighLevelClient;

    @Override
    public Boolean parseProductSafeEs(String keyword) throws IOException {
        //获取数据集
        List productList = new HtmlParseUtil().parseJD(keyword);
        //创建批处理请求对象
        BulkRequest request = new BulkRequest();
        //设置超时时间为
        request.timeout("2s");

        //将批量数据存入es中
        for (int i = 0; i < productList.size(); i++) {
            request.add(
                    new IndexRequest("jd_pro_index")
                            .id("" + (i+1))
                            .source(JSON.toJSONString(productList.get(i)), XContentType.JSON)
            );
        }

        //提交请求
        BulkResponse response = restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
        boolean bool = response.hasFailures();//是否是失败,true代表失败,false代表成功
        //restHighLevelClient.close();
        return !bool;
    }

    @Override
    public List> searchProduct(String keyword, int pageNum, int pageSize) throws IOException {
        if(pageNum < 0){
            pageNum  = 0;
        }

        //创建查询请求对象
        SearchRequest request = new SearchRequest("jd_pro_index");

        //构建查询条件对象
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //构建查询条件(采用精确查询,根据keyword关键字查询name字段)
        //TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", keyword);

        //将查询条件放入到构建查询条件对象中
        searchSourceBuilder.query(matchQueryBuilder);
        //设置超时时间为60S
        searchSourceBuilder.timeout(new Timevalue(60, TimeUnit.SECONDS));
        //分页
        searchSourceBuilder.from(pageNum);//当前页
        searchSourceBuilder.size(pageSize);//每页显示数量

        //将搜索条件源放入到搜索请求中
        request.source(searchSourceBuilder);

        //客户端发送请求,获取响应结果
        SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);

        //关闭客户端
        //restHighLevelClient.close();

        //解析获取结数据集
        SearchHits searchHits = response.getHits();

        //创建list集合
        List> mapList = new ArrayList<>();

        //循环遍历数据集,封装到list集合中
        for (SearchHit hit : searchHits.getHits()) {
            Map hitMap = hit.getSourceAsMap();
            mapList.add(hitMap);
        }

        return mapList;
    }
}
2.4、编写Controller前端访问层

注意:此处所有的方法都不要关闭RestHighLevelClient客户端,否则其他方法会无法继续访问,同时报IO异常。

@Controller
public class ProductController {

    @Autowired
    RestHighLevelClient restHighLevelClient;

    @Autowired
    ProductService productService;

    
    @RequestMapping("/createIndex")
    @ResponseBody
    public String creatIndex() throws IOException {
        CreateIndexRequest request = new CreateIndexRequest("jd_pro_index");
        CreateIndexResponse response = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);

        System.out.println(response.isAcknowledged());

        if(response.isAcknowledged()){
            return "创建成功!";
        }else{
            return "创建失败!";
        }

    }

    
    @RequestMapping("/deleteIndex")
    @ResponseBody
    public String deleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("jd_pro_index");
        AcknowledgedResponse response = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);

        System.out.println(response.isAcknowledged());


        if(response.isAcknowledged()){
            return "删除成功!";
        }else{
            return "删除失败!";
        }

    }


    
    @RequestMapping("/toSafeEs/{keyword}")
    @ResponseBody
    public String parseProductSafeEs(@PathVariable("keyword") String keyword) throws IOException {
        if(productService.parseProductSafeEs(keyword)){
            return "爬取数据成功存入es中!";
        }
        return "爬取数据失败";
    }


    @RequestMapping("/searchEsDoc/{keyword}/{pageNum}/{pageSize}")
    @ResponseBody
    public List> searchProduct(
            @PathVariable("keyword") String keyword,
            @PathVariable("pageNum") int pageNum,
            @PathVariable("pageSize") int pageSize) throws IOException {
        List> mapList = productService.searchProduct(keyword, pageNum, pageSize);
        if (mapList != null){
            return mapList;
        }
        return null;
    }

}
2.5、测试接口

创建索引

爬取数据存入es中

查询数据:

2.6、前后端分离(简单使用Vue)
  • 下载vue依赖:用于渲染前端页面
  • 下载axios依赖:用于ajax请求后端接口

vue和axios都可以去官网下载,跟狂神学了一个小技巧,在本地新建一个英文目录文件夹,直接cmd进入该目录下(前提是安装了nodejs):

#如果之前没有初始化过,可以先执行初始化
npm init

#下载vue
npm install vue

#下载axios
npm install axios



修改index.xml首页:





    
    一宿君Java-ES仿京东实战
    
    




    

        
        
            
                
                    
                    
                        
                    

                    

                        
                        
                            
天猫搜索

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5618589.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-15
下一篇 2022-12-15

发表评论

登录后才能评论

评论列表(0条)

保存