使用NLTK WordNet查找专有名词_随笔

使用NLTK WordNet查找专有名词

我认为您不需要WordNet来查找专有名词，我建议使用词性标记器

pos_tag

。

要查找专有名词，请寻找

NNP

标签：

from nltk.tag import pos_tagsentence = "Michael Jackson likes to eat at McDonalds"tagged_sent = pos_tag(sentence.split())# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]propernouns = [word for word,pos in tagged_sent if pos == 'NNP']# ['Michael','Jackson', 'McDonalds']

您可能没有，因为很满意

Michael

，并

Jackson

分裂成2个令牌，则可能需要更复杂的东西，如名称实体恶搞。

如

penntreebank

标签集所记录的那样，对于所有格名词而言，只要找到

POS

标签，您就可以轻松找到http://www.mozart-
oz.org/mogul/doc/lager/brill-tagger/penn.html
。但往往是恶搞不标记

POS

时，它的一个

NNP

。

要查找所有名词，请查找str.endswith（“’s”）或str.endswith（“ s’”）：

from nltk.tag import pos_tagsentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries"tagged_sent = pos_tag(sentence.split())# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]# ["Jackson's", "Agnes'"]

另外，您可以使用NLTK，

ne_chunk

但是除非您担心从句子中获得哪种专有名词，否则它似乎没有其他作用：

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)][Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]['Michael', 'Jackson', 'Daniel']

使用

ne_chunk

有点冗长，并不能使您拥有所有格。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5616903.html

使用NLTK WordNet查找专有名词

发表评论

评论列表（0条）