Elasticsearch 入门到精通-Ansj分词器的安装和使用

Elasticsearch 入门到精通-Ansj分词器的安装和使用,第1张

Elasticsearch 入门到精通-Ansj分词器的安装和使用 一、版本和对应关系 pluginelasticsearch7.6.27.6.27.7.07.7.07.7.17.7.17.8.07.8.07.8.17.8.17.9.07.9.07.9.17.9.17.9.27.9.27.9.37.9.3 二、安装步骤
1、下载安装ES对应Plugin Release版本


        a. GitHub - NLPchina/elasticsearch-analysis-ansj

        b. 解压 elasticsearch-analysis-ansj-7.7.1-release.zip 到 plugins 目录下

        c.将 ansj.cfg.xml 拷贝到 es 对应的 config 目录下

        d.在es config 同级目录创建 library目录用于放置分词数据,将词库信息放入该目录

自定义词库(default.dic),停词词库(stop.dic),歧义词词库(ambiguity.dic),同义词词库(synonyms.dic)

2、重启Elasticsearch 三、分词方式 1、分词方式解析

base_ansj

基本分词index_ansj索引分词,拆分的最细query_ansj查询分词dic_ansj用户自定义分词nlp_ansj自然语言分词 2、样例
POST _analyze
{
  "text": ["美国阿拉斯加州发生8.0级地震"],
  "analyzer": "index_ansj"
}

结果

{
  "tokens" : [
    {
      "token" : "美国",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "ns",
      "position" : 0
    },
    {
      "token" : "美",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "b",
      "position" : 1
    },
    {
      "token" : "国",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "n",
      "position" : 2
    },
    {
      "token" : "阿拉斯加州",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "nsf",
      "position" : 3
    },
    {
      "token" : "阿拉斯加",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "nsf",
      "position" : 4
    },
    {
      "token" : "阿拉斯",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "nsf",
      "position" : 5
    },
    {
      "token" : "阿拉",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "r",
      "position" : 6
    },
    {
      "token" : "阿",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "b",
      "position" : 7
    },
    {
      "token" : "拉斯",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "nrf",
      "position" : 8
    },
    {
      "token" : "拉",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "v",
      "position" : 9
    },
    {
      "token" : "斯",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "b",
      "position" : 10
    },
    {
      "token" : "加州",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "ns",
      "position" : 11
    },
    {
      "token" : "加",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "v",
      "position" : 12
    },
    {
      "token" : "州",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "n",
      "position" : 13
    },
    {
      "token" : "发生",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "v",
      "position" : 14
    },
    {
      "token" : "发",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "v",
      "position" : 15
    },
    {
      "token" : "生",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "v",
      "position" : 16
    },
    {
      "token" : "8.0级",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "mq",
      "position" : 17
    },
    {
      "token" : "0",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "w",
      "position" : 18
    },
    {
      "token" : "级",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "q",
      "position" : 19
    },
    {
      "token" : "地震",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "n",
      "position" : 20
    },
    {
      "token" : "地",
      "start_offset" : 13,
      "end_offset" : 14,
      "type" : "ude2",
      "position" : 21
    },
    {
      "token" : "震",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "vi",
      "position" : 22
    }
  ]
}
四、ansj暴露的api整理 请求链接描述/_cat/ansj执行分词/_cat/ansj/config显示全部配置/_ansj/flush/config刷新全部配置/_ansj/flush/config/single执行刷新配置/_ansj/flush/dic更新全部词典/_ansj/flush/dic/single执行更新词典

 http://127.0.0.1:9200/_ansj/flush/dic/single?key=dic

/_cat/ansj 执行分词

例子:/_cat/ansj?text=中国&type=index_ansj&dic=dic&stop=stop&ambiguity=ambiguity&synonyms=synonyms

其中text和type是必须传的:text为需要进行分词的语句,type是分词类型,支持如下

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5718064.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-18
下一篇 2022-12-18

发表评论

登录后才能评论

评论列表(0条)

保存