python 插入数据报 python org.wltea.analyzer.dic.Dictionary.singleton.

python 插入数据报 python org.wltea.analyzer.dic.Dictionary.singleton.,第1张

python 插入数据报 python org.wltea.analyzer.dic.Dictionary.singleton. 环境:
  1. Elasticsearch 版本:7.10.1
  2. elasticsearch-analysis-ik 版本:7.10.1
  3. Elasticsearch *** 作的 Python 库版本:7.16.1
问题:

在使用 elasticsearch-analysis-ik 重建索引的代码如下:

from elasticsearch import Elasticsearch
 
es = Elasticsearch(['http://192.168.4.10:9200/'])
mapping = {
    'properties': {
        'title': {
            'type': 'text',
            'analyzer': 'ik_max_word',
            'search_analyzer': 'ik_max_word'
        }
    }
}
es.indices.delete(index='news', ignore=[400,404])
es.indices.create(index='news',ignore=400)
result = es.indices.put_mapping(index='news', body=mapping)
print(result)

插入样本数据的代码如下:

from elasticsearch import Elasticsearch
 
es = Elasticsearch(['http://192.168.4.10:9200/'])
 
datas = [
    {
        'title': '高考结局大不同',
        'url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html',
    },
    {
        'title': '进入职业大洗牌时代,“吃香”职业还吃香吗?',
        'url': 'https://new.qq.com/omn/20210828/20210828A025LK00.html',
    },
    {
        'title': '乘风破浪不负韶华,奋斗青春圆梦高考',
        'url': 'http://view.inews.qq.com/a/EDU2021041600732200',
    },
    {
        'title': '他,活出了我们理想的样子',
        'url': 'https://new.qq.com/omn/20210821/20210821A020ID00.html',
    }
]
 
for data in datas:
    es.index(index='news' body=data)

然后发生以下错误信息:

/data/web-spider2/chapter04/4.7/insert_more_data.py:26: DeprecationWarning: The 'body' parameter is deprecated for the 'index' API and will be removed in a future version. Instead use the 'document' parameter. See https://github.com/elastic/elasticsearch-py/issues/1698 for more information
  es.index(index='news', body=data)
Traceback (most recent call last):
  File "/data/web-spider2/chapter04/4.7/insert_more_data.py", line 26, in 
    es.index(index='news', body=data)
  File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/client/__init__.py", line 413, in index
    return self.transport.perform_request(
  File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/transport.py", line 466, in perform_request
    raise e
  File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/transport.py", line 427, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 291, in perform_request
    self._raise_error(response.status, raw_data)
  File "/root/.virtualenvs/web-spider2/lib/python3.8/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.TransportError: TransportError(500, 'null_pointer_exception', 'Cannot invoke "org.wltea.analyzer.dic.DictSegment.match(char[], int, int)" because "org.wltea.analyzer.dic.Dictionary.singleton._StopWords" is null')

一个是提示body 参数被弃用,另一个是报 TransportError 错误。

解决:

解决此问题,只需要修改 es.index 部分的代码,代码如下:

from elasticsearch import Elasticsearch
from elasticsearch.client.utils import _bulk_body
 
es = Elasticsearch(['http://192.168.4.10:9200/'])

datas = [
    {
        'title': '高考结局大不同',
        'url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html',
    },
    {
        'title': '进入职业大洗牌时代,“吃香”职业还吃香吗?',
        'url': 'https://new.qq.com/omn/20210828/20210828A025LK00.html',
    },
    {
        'title': '乘风破浪不负韶华,奋斗青春圆梦高考',
        'url': 'http://view.inews.qq.com/a/EDU2021041600732200',
    },
    {
        'title': '他,活出了我们理想的样子',
        'url': 'https://new.qq.com/omn/20210821/20210821A020ID00.html',
    }
]

for data in datas:
    es.index(index='news',doc_type='_doc', document={"doc": data})

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5696270.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存