NLTK:语料库级BLEU与句子级BLEU分数

NLTK:语料库级BLEU与句子级BLEU分数,第1张

NLTK:语料库级BLEU与句子级BLEU分数

TL; DR

>>> import nltk>>> hypothesis = ['This', 'is', 'cat'] >>> reference = ['This', 'is', 'a', 'cat']>>> references = [reference] # list of references for 1 sentence.>>> list_of_references = [references] # list of references for all sentences in corpus.>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)0.6025286104785453>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)0.6025286104785453

(注意:您必须在

develop
分支上提取最新版本的NLTK才能获得BLEU评分实施的稳定版本)


在长

其实只要有一个参考,在你的整个语料库一个假设,既

corpus_bleu()
sentence_bleu()
应返回相同的值,如上面的例子。

在代码中,我们看到

sentence_bleu
实际上是鸭子类型
corpus_bleu

def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),       smoothing_function=None):    return corpus_bleu([references], [hypothesis], weights, smoothing_function)

如果我们查看以下参数

sentence_bleu

 def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),smoothing_function=None):    """"    :param references: reference sentences    :type references: list(list(str))    :param hypothesis: a hypothesis sentence    :type hypothesis: list(str)    :param weights: weights for unigrams, bigrams, trigrams and so on    :type weights: list(float)    :return: The sentence-level BLEU score.    :rtype: float    """

sentence_bleu
的引用输入为
list(list(str))

因此,如果您有一个句子字符串,例如

"This is a cat"
,则必须对其进行标记以获取字符串列表
["This", "is", "a","cat"]
并且由于它允许多个引用,因此它必须是字符串列表的列表,例如,如果您有第二个引用,“这是一条猫”,您的输入
sentence_bleu()
将是:

references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]hypothesis = ["This", "is", "cat"]sentence_bleu(references, hypothesis)

当涉及到

corpus_bleu()
list_of_references参数时,它基本上是一个列表,该列表包含了所有
sentence_bleu()
作为引用的内容:

def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),     smoothing_function=None):    """    :param references: a corpus of lists of reference sentences, w.r.t. hypotheses    :type references: list(list(list(str)))    :param hypotheses: a list of hypothesis sentences    :type hypotheses: list(list(str))    :param weights: weights for unigrams, bigrams, trigrams and so on    :type weights: list(float)    :return: The corpus-level BLEU score.    :rtype: float    """

除了查看内的doctest之外

nltk/translate/bleu_score.py
,您还可以查看unittest,
nltk/test/unit/translate/test_bleu_score.py
以了解如何使用内的每个组件
bleu_score.py

顺便说一句,由于使用(()(https://github.com/nltk/nltk/blob/develop/nltk/translate/

init
.py#L21
)中的

sentence_bleu
导入,
bleu``nltk.translate.__init__.py

****

from nltk.translate import bleu

将与以下相同:

from nltk.translate.bleu_score import sentence_bleu

并在代码中:

>>> from nltk.translate import bleu>>> from nltk.translate.bleu_score import sentence_bleu>>> from nltk.translate.bleu_score import corpus_bleu>>> bleu == sentence_bleuTrue>>> bleu == corpus_bleuFalse


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5642840.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存