此解决方案需要预处理您的语料库。但是一旦完成,这将是一个非常快速的字典查找。
from collections import defaultdictfrom stemming.porter2 import stemwith open('/usr/share/dict/words') as f: words = f.read().splitlines()stems = defaultdict(list)for word in words: word_stem = stem(word) stems[word_stem].append(word)if __name__ == '__main__': word = 'leukocyte' word_stem = stem(word) print(stems[word_stem])
对于
/usr/share/dict/words语料库,这将产生结果
['leukocyte', "leukocyte's", 'leukocytes']
它使用
stemming可以安装的模块
pip install stemming
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)