- Elasticsearch 版本:7.10.1
- elasticsearch-analysis-ik 版本:7.10.1
- Elasticsearch *** 作的 Python 库版本:7.16.1
在使用 elasticsearch-analysis-ik 重建索引的代码如下:
from elasticsearch import Elasticsearch es = Elasticsearch(['http://192.168.4.10:9200/']) mapping = { 'properties': { 'title': { 'type': 'text', 'analyzer': 'ik_max_word', 'search_analyzer': 'ik_max_word' } } } es.indices.delete(index='news', ignore=[400,404]) es.indices.create(index='news',ignore=400) result = es.indices.put_mapping(index='news', body=mapping) print(result)
插入样本数据的代码如下:
from elasticsearch import Elasticsearch es = Elasticsearch(['http://192.168.4.10:9200/']) datas = [ { 'title': '高考结局大不同', 'url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html', }, { 'title': '进入职业大洗牌时代,“吃香”职业还吃香吗?', 'url': 'https://new.qq.com/omn/20210828/20210828A025LK00.html', }, { 'title': '乘风破浪不负韶华,奋斗青春圆梦高考', 'url': 'http://view.inews.qq.com/a/EDU2021041600732200', }, { 'title': '他,活出了我们理想的样子', 'url': 'https://new.qq.com/omn/20210821/20210821A020ID00.html', } ] for data in datas: es.index(index='news' body=data)
然后发生以下错误信息:
/data/web-spider2/chapter04/4.7/insert_more_data.py:26: DeprecationWarning: The 'body' parameter is deprecated for the 'index' API and will be removed in a future version. Instead use the 'document' parameter. See https://github.com/elastic/elasticsearch-py/issues/1698 for more information es.index(index='news', body=data) Traceback (most recent call last): File "/data/web-spider2/chapter04/4.7/insert_more_data.py", line 26, ines.index(index='news', body=data) File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped return func(*args, params=params, headers=headers, **kwargs) File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/client/__init__.py", line 413, in index return self.transport.perform_request( File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/transport.py", line 466, in perform_request raise e File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/transport.py", line 427, in perform_request status, headers_response, data = connection.perform_request( File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 291, in perform_request self._raise_error(response.status, raw_data) File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)( elasticsearch.exceptions.TransportError: TransportError(500, 'null_pointer_exception', 'Cannot invoke "org.wltea.analyzer.dic.DictSegment.match(char[], int, int)" because "org.wltea.analyzer.dic.Dictionary.singleton._StopWords" is null')
一个是提示body 参数被弃用,另一个是报 TransportError 错误。
解决:解决此问题,只需要修改 es.index 部分的代码,代码如下:
from elasticsearch import Elasticsearch from elasticsearch.client.utils import _bulk_body es = Elasticsearch(['http://192.168.4.10:9200/']) datas = [ { 'title': '高考结局大不同', 'url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html', }, { 'title': '进入职业大洗牌时代,“吃香”职业还吃香吗?', 'url': 'https://new.qq.com/omn/20210828/20210828A025LK00.html', }, { 'title': '乘风破浪不负韶华,奋斗青春圆梦高考', 'url': 'http://view.inews.qq.com/a/EDU2021041600732200', }, { 'title': '他,活出了我们理想的样子', 'url': 'https://new.qq.com/omn/20210821/20210821A020ID00.html', } ] for data in datas: es.index(index='news',doc_type='_doc', document={"doc": data})
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)