日报2022-05-02

日报2022-05-02,第1张

2022-05-02 几种包映射算法实现

聚类中心 C = { c 1 , . . . , c k } C = \{c_1,...,c_k\} C={c1,...,ck}, 包 B i = { b i 1 , . . . , b i m } \mathbf{B}_i=\{b_{i1},...,b_{im}\} Bi={bi1,...,bim}
一、最短距离映射
映射方式如下
h i = m a x   e x p j ( − ∥ b i j − c i ∥ σ 2 ) h_i=\underset{j}{\rm{max}\,\rm{exp}}(- \frac{\|b_{ij}-c_i \|}{\sigma^2}) hi=jmaxexp(σ2bijci)
B i \mathbf{B}_i Bi最终的映射向量是 H ( B i ) = [ h 1 , . . . , h k ] H(\mathbf{B}_i) = [h_1,...,h_k] H(Bi)=[h1,...,hk]

# 多样密度映射
        ret_vec = np.ones(self.k_m)
        index = 0
        for item in centers:
            ret_vec[index] = np.linalg.norm(item - self.bags[idx, 0][:, :self.dimensions-1], axis=1).min()
            index += 1
        ret_vec = [math.exp(-x) for x in ret_vec]
        return ret_vec / dis_euclidean(ret_vec, np.zeros_like(ret_vec))

二、去中心化映射
B i \mathbf{B}_i Bi里面的实例根据 C C C 被分为了 K K K类, K ≤ k K\le k Kk, 每一类为一个集合 S i = { s 1 , . . . , s l } S_i=\{s_1,...,s_l\} Si={s1,...,sl}, 映射方式如下
h i = ∑ a = 1 l s a − c i h_i = \sum^l_{a=1}s_a-c_i hi=a=1lsaci
h i h_i hi是与实例同维的向量,最终 B i \mathbf{B}_i Bi映射为 H ( B i ) = [ h 1 , . . . , h k ] H(\mathbf{B}_i) = [h_1,...,h_k] H(Bi)=[h1,...,hk],如果没有 c i c_i ci这一类的实例,则对应的 h i h_i hi 缺失值补零

ret_vec = np.zeros((len(centers), self.dimensions-1))
        idx_ins = 0
        for ins in self.bags[idx, 0][:, :self.dimensions-1]:
            ret_vec[labels[idx_ins]] += ins - centers[labels[idx_ins]]
            idx_ins += 1
        ret_vec = np.resize(ret_vec, self.k_m * self.dimensions-1)
        ret_vec = np.sign(ret_vec) * np.sqrt(np.abs(ret_vec))
        return ret_vec / dis_euclidean(ret_vec, np.zeros_like(ret_vec))

三、均值映射
B i \mathbf{B}_i Bi里面的实例根据 C C C 被分为了 K K K类, K ≤ k K\le k Kk, 每一类为一个集合 S i = { s 1 , . . . , s l } S_i=\{s_1,...,s_l\} Si={s1,...,sl}, 映射方式如下
h i = 1 ∣ S i ∣ ∑ a = 1 l s a h_i = \frac{1}{|S_i|}\sum^l_{a=1}s_a hi=Si1a=1lsa
h i h_i hi是与实例同维的向量,最终 B i \mathbf{B}_i Bi映射为 H ( B i ) = [ h 1 , . . . , h k ] H(\mathbf{B}_i) = [h_1,...,h_k] H(Bi)=[h1,...,hk],如果没有 c i c_i ci这一类的实例,则对应的 h i h_i hi 缺失值补零

ret_vec = np.zeros((self.k_m, self.dimensions-1))
        idx_ins = 0
        for ins in self.bags[idx, 0][:, :self.dimensions-1]:
            ret_vec[labels[idx_ins]] += ins
            idx_ins += 1
        unique, count = np.unique(labels, return_counts=True)
        data_count = dict(zip(unique, count))
        for key in data_count.keys():
            ret_vec[key] /= data_count[key]
        ret_vec = np.resize(ret_vec, self.k_m * self.dimensions-1)
        ret_vec = np.sign(ret_vec) * np.sqrt(np.abs(ret_vec))
        return ret_vec / dis_euclidean(ret_vec, np.zeros_like(ret_vec))

四、按比例映射
B i \mathbf{B}_i Bi里面的实例根据 C C C 被分为了 K K K类, K ≤ k K\le k Kk, 每一类为一个集合 S i = { s 1 , . . . , s l } S_i=\{s_1,...,s_l\} Si={s1,...,sl}, 映射方式如下
h i = ∣ S i ∣ ∣ B i ∣ h_i = \frac{|S_i|}{|\mathbf{B}_i|} hi=BiSi
最终 B i \mathbf{B}_i Bi映射为 H ( B i ) = [ h 1 , . . . , h k ] H(\mathbf{B}_i) = [h_1,...,h_k] H(Bi)=[h1,...,hk],如果没有 c i c_i ci这一类的实例,则对应的 h i h_i hi 缺失值补零

ret_vec = np.zeros(self.k_m)
        bag_size = len(self.bags_size[idx])
        unique, count = np.unique(labels, return_counts=True)
        data_count = dict(zip(unique, count))
        for key in data_count.keys():
            ret_vec[key] =  data_count[key]/bag_size
        return ret_vec
测试效果

一、最短距离映射

bag-level classify result:
confusion:
 [[282   2]
 [  2 650]]
precision: 0.9969325153374233
recall: 0.9969325153374233
f1-score: 0.9969325153374233
accuracy: 0.9957264957264957
start training single-instance model----------------
model trainig complete!
Finally instance result:
confusion:
 [[ 337    3]
 [   1 9019]]
precision: 0.9996674794945688
recall: 0.9998891352549889
f1-score: 0.9997782950892363
accuracy: 0.9995726495726496
class-time 27.534499883651733

二、去中心化映射

bag-level classify result:
confusion:
 [[290   2]
 [ 17 627]]
precision: 0.9968203497615262
recall: 0.9736024844720497
f1-score: 0.9850746268656716
accuracy: 0.9797008547008547
start training single-instance model----------------
model trainig complete!
Finally instance result:
confusion:
 [[ 342    4]
 [   7 9007]]
precision: 0.9995560981023194
recall: 0.9992234302196583
f1-score: 0.9993897364771152
accuracy: 0.9988247863247863
class-time 31.642945766448975

三、均值映射

bag-level classify result:
confusion:
 [[307   4]
 [ 10 615]]
precision: 0.9935379644588045
recall: 0.984
f1-score: 0.9887459807073955
accuracy: 0.9850427350427351
start training single-instance model----------------
model trainig complete!
Finally instance result:
confusion:
 [[ 368    5]
 [   1 8986]]
precision: 0.9994438883327772
recall: 0.999888728162902
f1-score: 0.9996662587607075
accuracy: 0.9993589743589744
class-time 32.479421615600586

四、按比例映射

bag-level classify result:
confusion:
 [[272  25]
 [  9 630]]
precision: 0.9618320610687023
recall: 0.9859154929577465
f1-score: 0.973724884080371
accuracy: 0.9636752136752137
start training single-instance model----------------
model trainig complete!
Finally instance result:
confusion:
 [[ 333   28]
 [   1 8998]]
precision: 0.9968978506536672
recall: 0.999888876541838
f1-score: 0.9983911234396671
accuracy: 0.9969017094017094
class-time 27.800466299057007

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/925006.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-05-16
下一篇 2022-05-16

发表评论

登录后才能评论

评论列表(0条)

保存