TL; DR :
>>> import nltk>>> hypothesis = ['This', 'is', 'cat'] >>> reference = ['This', 'is', 'a', 'cat']>>> references = [reference] # list of references for 1 sentence.>>> list_of_references = [references] # list of references for all sentences in corpus.>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)0.6025286104785453>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)0.6025286104785453
(注意:您必须在
develop分支上提取最新版本的NLTK才能获得BLEU评分实施的稳定版本)
在长 :
其实只要有一个参考,在你的整个语料库一个假设,既
corpus_bleu()与
sentence_bleu()应返回相同的值,如上面的例子。
在代码中,我们看到
sentence_bleu实际上是鸭子类型
corpus_bleu:
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): return corpus_bleu([references], [hypothesis], weights, smoothing_function)
如果我们查看以下参数
sentence_bleu:
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),smoothing_function=None): """" :param references: reference sentences :type references: list(list(str)) :param hypothesis: a hypothesis sentence :type hypothesis: list(str) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The sentence-level BLEU score. :rtype: float """
sentence_bleu的引用输入为
list(list(str))。
因此,如果您有一个句子字符串,例如
"This is a cat",则必须对其进行标记以获取字符串列表,
["This", "is", "a","cat"]并且由于它允许多个引用,因此它必须是字符串列表的列表,例如,如果您有第二个引用,“这是一条猫”,您的输入
sentence_bleu()将是:
references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]hypothesis = ["This", "is", "cat"]sentence_bleu(references, hypothesis)
当涉及到
corpus_bleu()list_of_references参数时,它基本上是一个列表,该列表包含了所有
sentence_bleu()作为引用的内容:
def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): """ :param references: a corpus of lists of reference sentences, w.r.t. hypotheses :type references: list(list(list(str))) :param hypotheses: a list of hypothesis sentences :type hypotheses: list(list(str)) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The corpus-level BLEU score. :rtype: float """
除了查看内的doctest之外
nltk/translate/bleu_score.py,您还可以查看unittest,
nltk/test/unit/translate/test_bleu_score.py以了解如何使用内的每个组件
bleu_score.py。
顺便说一句,由于使用(()(https://github.com/nltk/nltk/blob/develop/nltk/translate/
init
.py#L21)中的
sentence_bleu导入,
bleu``nltk.translate.__init__.py
****
from nltk.translate import bleu
将与以下相同:
from nltk.translate.bleu_score import sentence_bleu
并在代码中:
>>> from nltk.translate import bleu>>> from nltk.translate.bleu_score import sentence_bleu>>> from nltk.translate.bleu_score import corpus_bleu>>> bleu == sentence_bleuTrue>>> bleu == corpus_bleuFalse
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)