SiamFC,直接就是模板特征作为核去和搜索帧特征互相关,实现上就是一个卷积 *** 作,示意图可看下面:
就是模板帧特征在搜索帧特征上滑动,逐通道之间互相作内积,最后输出的就是一个通道数为1的特征。
比如说input=6x6,in_channels=3,out_channels=4,kernel_size=3,那么需要用4组3x3x3的卷积,每个3x3x3的卷积计算结果为1x4x4,总共结果为4x4x4;(计算卷积次数为3x4次)
def naive_xcorr(self, z, x):
# naive cross correlation
nz = z.size(0)
nx, c, h, w = x.size()
x = x.view(-1, nz * c, h, w)
out = F.conv2d(x, z, groups=nz)
out = out.view(nx, -1, out.size(-2), out.size(-1))
return out
depth-wise correlation
SiamRPN++,思路上和naive correlation差不多,只不过一个是全部通道加和起来了,一个就是一个通道就输出一个通道,示意图可看下面:
比如说input=6x6,in_channels=4,out_channels=4,kernel_size=3,那么需要用1组3x3x4的卷积,每个3x3x1的卷积与对应通道的计算结果为1x4x4,总共结果为4x4x4;(计算卷积次数为4次)
def depthwise_xcorr(search, kernel):
"""depthwise cross correlation
"""
batch = kernel.size(0)
channel = kernel.size(1)
search = search.view(1, batch * channel, search.size(2), search.size(3))
kernel = kernel.reshape(batch * channel, 1, kernel.size(2), kernel.size(3))
out = torch.nn.functional.conv2d(search, kernel, groups=batch * channel)
#groups=输入的维度的时候,就是depth-wise correlation
out = out.view(batch, channel, out.size(2), out.size(3))
return out
pixel-wise correlation
虽然Alpha-refine提到了这种correlation,其实采用的是Ranking Attention Network for Fast Video Object Segmentation这篇文章里的思想,因为两者最后都得预测mask,所以借鉴了一下。
就是让模板特征的HzWz个1x1xC特征与搜索帧特征进行卷积,最后的通道数是HzWz,大小因为核大小是1x1的,所以不会改变,就是Hx和Wx。这个甚至不需要用到卷积函数,直接tensor矩阵相乘就行。
def pixelwise_xcorr(kernel, search):
b, c, h, w = search.shape
ker = kernel.reshape(b, c, -1).transpose(1, 2)
#就是做了一个卷积大小为输入维度,做转置是为了便于后面作乘法
feat = search.reshape(b, c, -1)
corr = torch.matmul(ker, feat)
corr = corr.reshape(*corr.shape[:2], h, w)
return corr
cross correlation还有pixel-to-global correlation、saliency-associated correlation、AutoMatch,请参考这里
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)