您可以定义这两个功能
def word2vec(word): from collections import Counter from math import sqrt # count the characters in word cw = Counter(word) # precomputes a set of the different characters sw = set(cw) # precomputes the "length" of the word vector lw = sqrt(sum(c*c for c in cw.values())) # return a tuple return cw, sw, lwdef cosdis(v1, v2): # which characters are common to the two words? common = v1[1].intersection(v2[1]) # by definition of cosine distance we have return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]
并在此示例中使用它们
>>> a = 'safasfeqefscwaeeafweeaeawaw'>>> b = 'tsafdstrdfadsdfdswdfafdwaed'>>> c = 'optykop;lvhopijresokpghwji7'>>> >>> va = word2vec(a)>>> vb = word2vec(b)>>> vc = word2vec(c)>>> >>> print cosdis(va,vb)0.551843662321>>> print cosdis(vb,vc)0.113746579656>>> print cosdis(vc,va)0.153494378078
顺便说一句,
word2vec您在标签中提到的是完全不同的业务,它要求我们中的一个人花费大量时间和精力来研究它,并猜测是什么,我不是那个人…
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)