主要分为两类:label based measures和example based measures。
就是针对每一个分类,都进行一次计算,最后再用一种average方法把多个分类统一起来。
假设有这么一组数据,
expected predicted
A, C A, B
C C
A, B, C B, C
用sklearn MultiLabelBinarizer
进行转化
expected predicted
1 0 1 1 1 0
0 0 1 0 0 1
1 1 1 0 1 1
对classA来说,
TP = 1(真实和预测都是1)
FP = 0(真实0,预测1)
TN = 1(真实0,预测0)
FN = 1(真实1,预测0)
TN FP 1 0
FN TP 1 1
precision = TP / (TP + FP) = 1 / (1+0) = 1
recall = TP / (TP + FN) = 1 / (1+1) = 0.5
f1-score = 2*p*r / (p+r) = 0.667
class B
TN FP 1 1
FN TP 0 1
Precision = 0.5
Recall = 1.0
F1-score = 0.667
class C
TN FP 0 0
FN TP 1 2
Precision = 1.0
Recall = 0.667
F1-score = 0.8
-
macro average
Precision (macro avg) = (Precision of A + Precision of B + Precision of C) / 3 = 0.833
-
micro average (preferred)
Precision (micro avg) = sum(TP) / (sum(TP) + sum(FP)) = 1+1+2 / ((1+1+2) + (0+1+0)) = 0.8
-
weighted average
Precision(weighted avg) = [(Precision of A * support A) + (Precision of B * support B) + (Precision of C * support C)] / (support A + support B + support C) = (1*2 + 0.5*1 + 1*3) / 6 = 0.9166
-
sample average
第一行,真实 AC,预测 AB,precision 1/2 → 两个预测值中有一个是正确的第二行,真实 C,预测 C,precision 1
第三行,真实 ABC,预测 BC,precision 1 → 预测的都是对的
(1/2 + 1 + 1) / 3 = 5/6 = 0.833
-
classification_report
直接用classification_report
计算每对真实与预测标签的average difference
-
hamming loss
预测错了的label占总label的比例
-
subset accuracy
也叫exact match ratio
最严格的评估方法,真实和预测label必须完全一致,否则为0。
这种方法忽略了部分正确的情况,在scikit-learn中的accuracy_score就是subset accuracy。
-
example-based accuracy
预测正确的label占总label(预测为1和真实为1)的比例
-
example-based precision
预测正确的label占总预测label的比例
https://towardsdatascience.com/evaluating-multi-label-classifiers-a31be83da6ea Evaluating Multi-label Classifiers
https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff Deep dive into multi-label classification…! (With detailed Case Study)
https://medium.datadriveninvestor.com/a-survey-of-evaluation-metrics-for-multilabel-classification-bb16e8cd41cd Evaluation Metrics for Multi-Label Classification
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)