python – 获得scikit-learn中多标签预测的准确性_python

概述在 multilabel classification设置中, sklearn.metrics.accuracy_score仅计运算符集精度(3)：即,为样本预测的标签集必须与y_true中的相应标签集完全匹配. 这种计算精度的方法有时被命名,可能不那么模糊,精确匹配率(1)：有没有办法让其他典型的方法来计算scikit-learn的准确性,即 (如(1)和(2)中所定义,并且不那么模糊地称为汉在 multilabel classification设置中,sklearn.metrics.accuracy_score仅计运算符集精度(3)：即,为样本预测的标签集必须与y_true中的相应标签集完全匹配.

这种计算精度的方法有时被命名,可能不那么模糊,精确匹配率(1)：

有没有办法让其他典型的方法来计算scikit-learn的准确性,即

(如(1)和(2)中所定义,并且不那么模糊地称为汉明分数(4)(因为它与汉明损失密切相关),或基于标签
准确性)
？

(1)Sorower,Mohammad S.“A literature survey on algorithms for multi-label learning.”俄勒冈州立大学,Corvallis(2010年).

(2)Tsoumakas,Grigorios和Ioannis Katakis. “Multi-label classification: An overview.”信息学系,希腊塞萨洛尼基亚里士多德大学(2006年).

(3)Ghamrawi,Nadia和Andrew McCallum. “Collective multi-label classification.”第14届ACM国际信息与知识管理会议论文集. ACM,2005.

(4)Godbole,Shantanu和Sunita Sarawagi. “Discriminative methods for multi-labeled classification.”知识发现和数据挖掘的进展. Springer Berlin HeIDelberg,2004.22-30.

解决方法您可以自己编写一个版本,这是一个不考虑权重和规范化的示例.

import numpy as npy_true = np.array([[0,1,0],[0,1],[1,1]])y_pred = np.array([[0,0]])def hamming_score(y_true,y_pred,normalize=True,sample_weight=None):    '''    Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case    https://stackoverflow.com/q/32239577/395857    '''    acc_List = []    for i in range(y_true.shape[0]):        set_true = set( np.where(y_true[i])[0] )        set_pred = set( np.where(y_pred[i])[0] )        #print('\nset_true: {0}'.format(set_true))        #print('set_pred: {0}'.format(set_pred))        tmp_a = None        if len(set_true) == 0 and len(set_pred) == 0:            tmp_a = 1        else:            tmp_a = len(set_true.intersection(set_pred))/\                    float( len(set_true.union(set_pred)) )        #print('tmp_a: {0}'.format(tmp_a))        acc_List.append(tmp_a)    return np.mean(acc_List)if __name__ == "__main__":    print('Hamming score: {0}'.format(hamming_score(y_true,y_pred))) # 0.375 (= (0.5+1+0+0)/4)    # For comparison sake:    import sklearn.metrics    # Subset accuracy    # 0.25 (= 0+1+0+0 / 4) --> 1 if the prediction for one sample fully matches the gold. 0 otherwise.    print('Subset accuracy: {0}'.format(sklearn.metrics.accuracy_score(y_true,sample_weight=None)))    # Hamming loss (smaller is better)    # $$\text{HammingLoss}(x_i,y_i) = \frac{1}{|D|} \sum_{i=1}^{|D|} \frac{xor(x_i,y_i)}{|L|},$$    # where    #  - \(|D|\) is the number of samples      #  - \(|L|\) is the number of labels      #  - \(y_i\) is the ground truth      #  - \(x_i\)  is the prediction.      # 0.416666666667 (= (1+0+3+1) / (3*4) )    print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true,y_pred)))

输出：

Hamming score: 0.375Subset accuracy: 0.25Hamming loss: 0.416666666667

总结

以上是内存溢出为你收集整理的python – 获得scikit-learn中多标签预测的准确性全部内容，希望文章能够帮你解决python – 获得scikit-learn中多标签预测的准确性所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/1207351.html

python – 获得scikit-learn中多标签预测的准确性

发表评论

评论列表（0条）