按给定短语返回匹配项列表_随笔

按给定短语返回匹配项列表

我将按以下方式进行处理：

import itertoolsdef new_phrases(phrase, syns):    """Generate new phrases from a base phrase and synonyms."""    words = [syns.get(word, [word]) for word in phrase.split(' ')]    for t in itertools.product(*words):        yield ' '.join(t)def get_matches(phrase, syns, phrases):    """Generate acceptable new phrases based on a whitelist."""    phrases = set(phrases)    for new_phrase in new_phrases(phrase, syns):        if new_phrase in phrases: yield new_phrase

代码的根本是

words

in中的分配

new_phrases

，它将

phrase

和

syns

转换为更可用的形式，一个列表，其中每个元素都是该单词可接受的选择的列表：

>>> [syns.get(word, [word]) for word in phrase.split(' ')][['This'], ['is'], ['a'], ['small', 'tiny', 'little'], ['cottage', 'house']]

请注意以下几点：

使用生成器更有效地处理大量组合（而不是一次构建整个列表）；
使用a
```
set
```
进行有效的（
```
O(1)
```
，而
```
O(n)
```
不是列表）成员资格测试；
使用
```
itertools.product
```
生成
```
phrase
```
基于的可能组合
```
syns
```
（您也可以
```
itertools.ifilter
```
在实现中使用）；和
符合风格指南。

正在使用：

>>> list(get_matches(phrase, syns, phrases))['This is a small cottage', 'This is a tiny house']

要考虑的事情：

字符的情况如何（例如应如何
```
"House of Commons"
```
对待）？
标点符号呢？

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5427369.html

按给定短语返回匹配项列表

发表评论

评论列表（0条）