只有一种真正的方法可以做到这一点。您必须将您的数据作为关键字建立索引,并使用带状疱疹对其进行分析:
看到这个复制品:
首先,我们将创建两个自定义分析器:keyword和shingles:
PUT test{ "settings": { "analysis": { "analyzer": { "my_analyzer_keyword": { "type": "custom", "tokenizer": "keyword", "filter": [ "asciifolding", "lowercase" ] }, "my_analyzer_shingle": { "type": "custom", "tokenizer": "standard", "filter": [ "asciifolding", "lowercase", "shingle" ] } } } }, "mappings": { "your_type": { "properties": { "keyword": { "type": "string", "index_analyzer": "my_analyzer_keyword", "search_analyzer": "my_analyzer_shingle" } } } }}
现在,让我们使用您提供的数据创建一些示例数据:
POST /test/your_type/1{ "id": 1, "keyword": "thousand eyes"}POST /test/your_type/2{ "id": 2, "keyword": "facebook"}POST /test/your_type/3{ "id": 3, "keyword": "superdoc"}POST /test/your_type/4{ "id": 4, "keyword": "quora"}POST /test/your_type/5{ "id": 5, "keyword": "your story"}POST /test/your_type/6{ "id": 6, "keyword": "Surgery"}POST /test/your_type/7{ "id": 7, "keyword": "lending club"}POST /test/your_type/8{ "id": 8, "keyword": "ad roll"}POST /test/your_type/9{ "id": 9, "keyword": "the honest company"}POST /test/your_type/10{ "id": 10, "keyword": "Draft kings"}
最后查询以运行搜索:
POST /test/your_type/_search{ "query": { "match": { "keyword": "I saw the news of lending club on facebook, your story and quora" } }}
这是结果:
{ "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.009332742, "hits": [ { "_index": "test", "_type": "your_type", "_id": "2", "_score": 0.009332742, "_source": { "id": 2, "keyword": "facebook" } }, { "_index": "test", "_type": "your_type", "_id": "7", "_score": 0.009332742, "_source": { "id": 7, "keyword": "lending club" } }, { "_index": "test", "_type": "your_type", "_id": "4", "_score": 0.009207102, "_source": { "id": 4, "keyword": "quora" } }, { "_index": "test", "_type": "your_type", "_id": "5", "_score": 0.0014755741, "_source": { "id": 5, "keyword": "your story" } } ] }}
那么它在幕后做什么?
- 它将您的文档索引为整个关键字(它将整个字符串作为单个标记发出)。我还添加了asiifolding过滤器,因此它可以对字母进行规范化(即
é
成为e
)和小写过滤器(不区分大小写的搜索)。因此例如Draft kings
被索引为draft kings
- 现在,搜索分析器使用的是相同的逻辑,除了它的令牌生成器发出单词令牌,并在其之上创建带状疱疹(令牌的组合)之外,它将匹配第一步中索引的关键字。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)