如何使用Nest C＃Client在Elasticsearch中进行重音不敏感搜索？_随笔

如何使用Nest C＃Client在Elasticsearch中进行重音不敏感搜索？

您可以配置分析器以在索引时对文本进行分析，将其索引到要在查询时使用的multi_field中，以及保留原始源以返回结果。根据您所遇到的问题，听起来像您想要一个自定义分析器，该分析器使用

asciifolding

令牌过滤器在索引和搜索时转换为ASCII字符。

鉴于以下文件

public class document{    public int Id { get; set;}    public string Name { get; set; }}

创建索引时可以完成自定义分析器的设置。我们也可以同时指定映射

client.CreateIndex(documentsIndex, ci => ci    .Settings(s => s        .NumberOfShards(1)        .NumberOfReplicas(0)        .Analysis(analysis => analysis .TokenFilters(tokenfilters => tokenfilters     .AsciiFolding("folding-preserve", ft => ft         .PreserveOriginal()     ) ) .Analyzers(analyzers => analyzers     .Custom("folding-analyzer", c => c         .Tokenizer("standard")         .Filters("standard", "folding-preserve")     ) )        )    )    .Mappings(m => m        .Map<document>(mm => mm .AutoMap() .Properties(p => p     .String(s => s         .Name(n => n.Name)         .Fields(f => f  .String(ss => ss      .Name("folding")      .Analyzer("folding-analyzer")  )         )         .NotAnalyzed()     ) )        )    ));

在这里，我创建了一个只有一个分片且没有副本的索引（您可能希望针对您的环境进行更改），并创建了一个自定义分析器，

folding-analyzer

该分析器将标准令牌生成

standard

器与

folding-preserve

令牌过滤器和执行ascii
的令牌过滤器结合使用折叠，除了折叠的令牌外，还存储原始令牌（更多有关为什么可能在一分钟内有用的信息）。

我还映射了

document

类型，将

Name

属性映射为

multi_field

，具有默认字段

not_analyzed

（用于聚合）和将使用进行分析的

.folding

子字段

folding-analyzer

。默认情况下，原始源文档也将由Elasticsearch存储。

现在让我们索引一些文档

client.Index<document>(new document { Id = 1, Name = "Ayse" });client.Index<document>(new document { Id = 2, Name = "Ayşe" });// refresh the index after indexing to ensure the documents just indexed are// available to be searchedclient.Refresh(documentsIndex);

最后，搜索 Ayşe

var response = client.Search<document>(s => s    .Query(q => q        .QueryString(qs => qs .Fields(f => f     .Field(c => c.Name.Suffix("folding")) ) .Query("Ayşe")        )    ));

产量

{  "took" : 2,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "failed" : 0  },  "hits" : {    "total" : 2,    "max_score" : 1.163388,    "hits" : [ {      "_index" : "documents",      "_type" : "document",      "_id" : "2",      "_score" : 1.163388,      "_source" : {        "id" : 2,        "name" : "Ayşe"      }    }, {      "_index" : "documents",      "_type" : "document",      "_id" : "1",      "_score" : 0.3038296,      "_source" : {        "id" : 1,        "name" : "Ayse"      }    } ]  }}

这里要强调的两件事：

首先，

_source

包含发送到Elasticsearch的原始文本，因此使用

response.documents

，您将获得原始名称，例如

string.Join(",", response.documents.Select(d => d.Name));

会给你“Ayşe，Ayse”

其次，还记得我们将原始令牌保留在asiifolding令牌过滤器中吗？这样做意味着我们可以执行经过分析的查询，以不敏感地匹配重音，但在计分时也要考虑重音；在上面的例子中，得分
艾谢费里德阿卡尔 匹配 艾谢费里德阿卡尔 比更高 艾谢费里德阿卡尔 匹配 艾谢费里德阿卡尔 因为令牌 艾谢费里德阿卡尔 和
艾谢费里德阿卡尔 被索引为前，而仅 艾谢费里德阿卡尔
被索引为后者。当针对该

Name

属性执行要进行分析的查询时，将使用对该查询进行分析并执行对

folding-analyzer

匹配项的搜索

Index time----------document 1 name: Ayse --analysis--> Aysedocument 2 name: Ayşe --analysis--> Ayşe, AyseQuery time-----------query_string query input: Ayşe --analysis--> Ayşe, Aysesearch for documents with tokens for name field matching Ayşe or Ayse

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5019868.html

如何使用Nest C＃Client在Elasticsearch中进行重音不敏感搜索？

发表评论

评论列表（0条）