1.1 head插件 安装华为云镜像
ElasticSearch: https://mirrors.huaweicloud.com/elasticsearch/?C=N&O=D(下载与es相同版本
logstash: https://mirrors.huaweicloud.com/logstash/?C=N&O=D(未用上)
kibana: https://mirrors.huaweicloud.com/kibana/?C=N&O=D(下载与es相同版本
IK分词器:https://github.com/medcl/elasticsearch-analysis-ik/releases(下载与es相同版本)
header插件:https://github.com/mobz/elasticsearch-head/archive/master.zip
git clone git://github.com/mobz/elasticsearch-head.git`
解压文件并进入
npm install
npm run start
open http://localhost:9100
访问结果
1.2 elasticsearch安装解压
1.3 kibana安装解压
elasticsearch.yml配置文件修改,配置跨域访问
http.cors.enabled: true http.cors.allow-origin: "*"
启动elasticsearch.bat
访问http://localhost:9200
进入kibana-7.9.2-windows-x86_64x-packpluginstranslationstranslations 复制中文配置名称
修改config目录下kibana.yml文件,添加配置
i18n.locale: "zh-CN"
启动kibana.bat
1.4 IK分词器安装解压
访问http://localhost:5601
在elasticsearch/plugins的新建Ik文件,并将IK分词器解压后的文件移入
重启elasticsearch
这里可能会出现重启闪退的情况,原因经过查找是因为Ik分词器中elasticsearch依赖和我们使用的elasticsearch版本不一致导致。最好还是查看elasticsearch版本,下载相对应版本ik。
1.4.1 分词器测试启动便会加载ik插件
打开kibana控制台
GET _analyze { "analyzer": "ik_smart", "text": "今天星期六" } GET _analyze { "analyzer": "ik_max_word", "text": "今天星期六" }分别运行会得到以下结果
2、 elasticsearchRest风格相关 *** 作两个分词算法
ik_max_word: 会将文本做最细粒度的拆分,会穷尽各种可能的组合,适合 Term Query;
ik_smart:会做最粗粒度的拆分,只是大致的拆分了文本,适合 Phrase 查询。
- 在使用时我们可以在ik插件文件下添加自定义的字典文件xxx.dic,然后在IkAnalyer.cfg.xml添加自己的字典文件,这样在做分词是会保留自定义的词典
Rest风格说明
elasticsearch与数据库对应关系
创建索引
PUT /索引名/类型名/文档id { 请求体 } ------------------------------------- PUT /test1/type1/1 { "name":"周六", "age":"3" }
ctrl + enter 运行,运行成功后,在9100端口就可以看见索引test1
elasticsearch字段名称类型
指定字段类型创建索引,如果没有指定类型,es会默认配置字段类型
PUT /test2 { "mappings": { "properties": { "name":{ "type": "text" }, "birthday":{ "type": "date" } } } }
put修改索引信息
PUT /test1/type1/1 { "name":"周六", "age":"4" }2.2 post
更新索引信息
post修改字段信息,只许选择所需要修改的字段,就可以完成修改。
POST /test1/type1/1 { "name":"张三" } //如果id存在,会默认为更新update *** 作自动完成对应修改,但是缺失的字段信息会为空,同PUT。!!!!! ------------------------------------ POST /test1/type1/1_updade { "doc":{ "name":"张三" } }2.3 delete
删除索引
DELETE test2 ------------- DELETE test2/类型名/id2.4 get
获取索引信息
GET test1 ------------ GET /test1/type1/1 ------------ GET /test1/type1/_search?q=name:小明2.5 精确查询与模糊查询
(添加几条数据进行测试)
GET test2/user/_search { "query":{ "match": { "name": "小明" } } }
match 查找会进行模糊匹配,选取所有可能选项,每个可能都会有了一个_score分数,分数越高,匹配度也就越高,默认按分数排列
添加过滤条件
—过滤,只显示age和dec属性—
GET test2/user/_search { "query":{ "match": { "name": "小明" } }, "_source":["age","dec"] }
—排序(查询结果按年龄降序排列)—
GET test2/user/_search { "query":{ "match": { "name": "小明" } }, "sort":[ { "age":{ "order":"desc" } } ] }
—分页—
GET test2/user/_search { "query":{ "match": { "name": "小明" } }, "sort":[ { "age":{ "order":"desc" } } ], "from" :0, "size":2 }
说明:es不支持对text类型的字段进行聚合 *** 作,需要对age属性进行修改
PUT /test2/_mapping?pretty { "properties": { "age": { "type": "text", "fielddata": true } }
boolean值查询 多条件查询 must 相当于and ,should相当于or
must /should 查询
GET test2/user/_search { "query":{ "bool": { "must": [ { "match": { "name": "明" } }, { "match": { "age": "10" } } ] } } }
添加过滤条件,年龄大于等于10,小于等于20。lt/gt 小/大于,加e表等于
GET test2/user/_search { "query":{ "bool": { "should": [ { "match": { "name": "明" } } ], "filter": [ { "range": { "age": { "gte": 10, "lte": 20 } } } ] } } }
模糊查询 ,tags中含有游泳 足的结果,空格隔开
GET /test2/user/_search { "query":{ "match": { "tags": "游泳 足" } } }
term 精确查询 ,通过倒排索引指定的词条进行精确查找,match会进行分词解析。
创建索引,适应不同类型创建字段
PUT /test3 { "mappings": { "properties": { "name":{ "type": "text" }, "dec":{ "type": "keyword" } } } }
插入数据
PUT /test3/_doc/2 { "name":"qwe", "dec":"qwe11" } ----------- PUT /test3/_doc/1 { "name":"qwe", "dec":"qwe1" }
查询比较
GET _analyze { "analyzer": "keyword" , "text": "qwe" } ------------ GET _analyze { "analyzer": "standard" , "text": "qwe" }
standard 会分词,而keyword不会
term text查询
GET /test3/_search { "query": { "term": { "dec": { "value": "qwe" } } } } ----------------- GET /test3/_search { "query": { "term": { "name": { "value": "qwe" } } } }
查询name会有结果,而dec没有。对于精确查找keyword类型不会被分词。
term 多条件查询
GET /test3/_search { "query":{ "bool":{ "should":{ { "term":{ 条件1: } }, { "term":{ 条件2: } } } } } }2.6 高亮显示
GET test2/user/_search { "query":{ "match": { "name": "小明" } }, "highlight":{ "pre_tags": "3.2 创建配置类
7.9.2 1.8 org.springframework.boot spring-boot-starter-data-elasticsearchorg.springframework.boot spring-boot-starter-weborg.springframework.boot spring-boot-devtoolsruntime true org.projectlombok lomboktrue org.springframework.boot spring-boot-starter-testtest com.alibaba fastjson1.2.76
向spring容器中注册restHighLevelClient
@Configuration public class ElasticSearchConfig { @Bean public RestHighLevelClient restHighLevelClient() { RestHighLevelClient restHighLevelClient = new RestHighLevelClient( RestClient.builder(new HttpHost("127.0.0.1",9200,"http"))); return restHighLevelClient; } }3.3 测试API
在test类中注入restHighLevelClient
@Autowired @Qualifier("restHighLevelClient") RestHighLevelClient client;
创建索引
@Test void createIndex() throws IOException { // 1、创建索引请求 CreateIndexRequest request = new CreateIndexRequest("tes"); // 2、 客户端执行请求 CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT); System.out.println(createIndexResponse); }
判断索引是否存在
@Test void exsitIndex() throws IOException { GetIndexRequest request = new GetIndexRequest("tes"); boolean exists = client.indices().exists(request,RequestOptions.DEFAULT); System.out.println(exists); }
删除索引
@Test void deleteIndex() throws IOException { DeleteIndexRequest request = new DeleteIndexRequest("tes"); AcknowledgedResponse ack = client.indices().delete(request, RequestOptions.DEFAULT); System.out.println(ack); } }
关于文档相关API
- 创建User类
@Data @AllArgsConstructor @NoArgsConstructor public class User { private String name; private int age; }
测试添加文档
@Test void addDoc() throws IOException { //创建对象 User user = new User("qwe",11); // 创建请求 IndexRequest request = new IndexRequest("test_api"); // 创建文档id,不设置会生成随机id request.id("1"); //request.timeout("60s"); //将数据转成json对象,以及设置传递参数类型 request.source(JSON.toJSonString(user), XContentType.JSON); IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT); System.out.println(indexResponse.toString()); System.out.println(indexResponse.status()); }
查看文档,是否存在
@Test void existDoc() throws IOException { GetRequest request = new GetRequest("test_api", "1"); boolean exists = client.exists(request, RequestOptions.DEFAULT); System.out.println(exists); } @Test void getDoc() throws IOException { GetRequest request = new GetRequest("test_api", "1"); GetResponse response = client.get(request, RequestOptions.DEFAULT); System.out.println(response.getSourceAsString()); }
删除文档
@Test void docDel() throws IOException { DeleteRequest deleteRequest = new DeleteRequest("test_api","1"); DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT); System.out.println(response.status()); }
批量添加
@Test void docBluk() throws IOException { BulkRequest request = new BulkRequest(); ArrayListuserArrayList = new ArrayList<>(); userArrayList.add(new User("zhangsan1",1)); userArrayList.add(new User("zhangsan2",2)); userArrayList.add(new User("zhangsan3",3)); userArrayList.add(new User("zhangsan4",4)); for (int i = 0; i < userArrayList.size(); i++) { request.add(new IndexRequest("test_api") .id(""+(i+1)) .source(JSON.toJSonString(userArrayList.get(i)),XContentType.JSON)); } BulkResponse response = client.bulk(request, RequestOptions.DEFAULT); System.out.println(response.status()); }
条件查询
@Test void docSearch() throws IOException { SearchRequest request = new SearchRequest("test_api"); //构建查询条件 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // 条件查询方法都在QueryBuilders中可以使用 // HighLightBuilder 高亮 // MatchAllQueryBuilder 匹配查找 TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "zhangsan1"); searchSourceBuilder.query(termQueryBuilder); searchSourceBuilder.timeout(Timevalue.timevalueSeconds(60)); request.source(searchSourceBuilder); SearchResponse response = client.search(request, RequestOptions.DEFAULT); System.out.println(JSON.toJSonString(response.getHits())); System.out.println("========================="); for (SearchHit hit : response.getHits().getHits()) { System.out.println(hit.getSourceAsMap()); } }4、爬虫 4.1 新建工程,修改pom依赖同上 4.2 添加配置
server.port=9090 spring.thymeleaf.cache=false4.3 导入前段页面资源
4.4 导入jsoup网页解析包ES资料地址:链接:https://pan.baidu.com/s/1PT3jLvCksOhq7kgAKzQm7g 提取码:s824
4.5 创建解析工具类org.jsoup jsoup1.14.3
public class HTMLParseUtil { public static void main(String[] args) throws IOException { //1、获取请求 https://search.jd.com/Search?keyword=java String url = "https://search.jd.com/Search?keyword=java"; //2、解析网页 (Jsoup返回的document就是浏览器的document对象) document document = Jsoup.parse(new URL(url), 30000); //3、根据网页标签Id获取指定内容 Element element = document.getElementById("J_goodsList"); //System.out.println(element.html()); // 获取所有li元素 Elements elements = element.getElementsByTag("li"); for (Element el : elements) { //网站图片加载采用为懒加载方式 //String img = el.getElementsByTag("img").eq(0).attr("src"); String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img"); String price = el.getElementsByClass("p-price").eq(0).text(); String name = el.getElementsByClass("p-name").eq(0).text(); System.out.println("=============================="); System.out.println(price); System.out.println(name); System.out.println(img); } } }
4.6 创建实体类Content封装解析内容运行结果
@Data @AllArgsConstructor @NoArgsConstructor public class Content { private String name; private String price; private String img; }4.7 修改解析方法
public class HTMLParseUtil { public static void main(String[] args) throws IOException { // //1、获取请求 https://search.jd.com/Search?keyword=java // String url = "https://search.jd.com/Search?keyword=java"; // //2、解析网页 (Jsoup返回的document就是浏览器的document对象) // document document = Jsoup.parse(new URL(url), 30000); // //3、根据网页标签Id获取指定内容 // Element element = document.getElementById("J_goodsList"); // //System.out.println(element.html()); // // 获取所有li元素 // Elements elements = element.getElementsByTag("li"); // for (Element el : elements) { // //网站图片加载采用为懒加载方式 // //String img = el.getElementsByTag("img").eq(0).attr("src"); // String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img"); // String price = el.getElementsByClass("p-price").eq(0).text(); // String name = el.getElementsByClass("p-name").eq(0).text(); // System.out.println("=============================="); // System.out.println(price); // System.out.println(name); // System.out.println(img); // } new HTMLParseUtil().ParseJD("vue").forEach(System.out::println); } public ListParseJD(String keyword) throws IOException { //1、获取请求 https://search.jd.com/Search?keyword=java String url = "https://search.jd.com/Search?keyword=" + keyword; //2、解析网页 (Jsoup返回的document就是浏览器的document对象) document document = Jsoup.parse(new URL(url), 30000); //3、根据网页标签Id获取指定内容 Element element = document.getElementById("J_goodsList"); //System.out.println(element.html()); ArrayList contents = new ArrayList<>(); // 获取所有li元素 Elements elements = element.getElementsByTag("li"); for (Element el : elements) { //网站图片加载采用为懒加载方式 //String img = el.getElementsByTag("img").eq(0).attr("src"); String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img"); String price = el.getElementsByClass("p-price").eq(0).text(); String name = el.getElementsByClass("p-name").eq(0).text(); contents.add(new Content(name, price, img)); } return contents; } }
4.7 将结果批量存入es中运行结果
service 层方法编辑
public Boolean parseContent(String keyword) throws IOException { Listcontents = new HTMLParseUtil().ParseJD(keyword); BulkRequest bulkRequest = new BulkRequest(); bulkRequest.timeout("2m"); for (Content content : contents) { bulkRequest.add(new IndexRequest("jd_goods") .source(JSON.toJSonString(content),XContentType.JSON)); } BulkResponse response = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT); return !response.hasFailures(); }
controller 实现调用
@RestController public class ContentController { @Autowired ContentService contentService; @GetMapping("/parse/{keyword}") public Boolean contentAdd(@PathVariable("keyword") String keyword) throws IOException { return contentService.parseContent(keyword); } }
4.8 搜索功能实现网页访问http://localhost:9090/parse/vue ,结果返回true,表示添加成功。
service 层方法编辑
public List
controller 层访问
@GetMapping("/search/{keyword}/{pageNo}/{pageSize}") public List4.8.1 引入vue.min.js 和 axios.min.js> searchPage(@PathVariable String keyword, @PathVariable int pageNo, @PathVariable int pageSize) throws IOException { return contentService.searchPage(keyword, pageNo, pageSize); }
修改前端页面,引入vue并进行绑定
狂神说Java-ES仿京东实战
- 狂神说Java
- 狂神说前端
- 狂神说Linux
- 狂神说大数据
- 狂神聊理财
{{result.price}}
{{result.name}}
店铺: 狂神说Java月成交999笔 评价 3
4.9高亮查询修改搜索方法,将原来查询结果中的name进行es高亮之后标签的替换,在前端解析出来
public List5. 完结,撒花> searchPageHL(String keyword, int pageNo, int pageSize) throws IOException { if (pageNo < 1) { pageNo = 1; } //条件查询 SearchRequest searchRequest = new SearchRequest("jd_goods"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //分页 searchSourceBuilder.from(pageNo); searchSourceBuilder.size(pageSize); //高亮显示 HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.field("name"); highlightBuilder.preTags(""); highlightBuilder.postTags(""); highlightBuilder.requireFieldMatch(false);//是否全部高亮 searchSourceBuilder.highlighter(highlightBuilder); //精准查询 TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword); searchSourceBuilder.query(termQueryBuilder); searchSourceBuilder.timeout(Timevalue.timevalueSeconds(10)); //执行搜索 searchRequest.source(searchSourceBuilder); SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); //封装结果 ArrayList > list = new ArrayList<>(); for (SearchHit hit : response.getHits().getHits()) { Map highlightFieldMap = hit.getHighlightFields(); HighlightField name = highlightFieldMap.get("name"); Map sourceAsMap = hit.getSourceAsMap();//原来的查询结果 if (name != null) { Text[] fragments = name.fragments(); String n_name = ""; for (Text t : fragments) { n_name += t; } sourceAsMap.put("name", n_name); } list.add(hit.getSourceAsMap()); } return list; }
本文学习于B站狂神说elasticsearch教程:传送门
转载注明出处
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)