因子分解机(Factorization Machine, FM, 2010年)是由Steffen Rendle提出的一种基于矩阵分解的机器学习算法。最大的特点是易于整合交叉特征、可以处理高度稀疏数据,主要应用在推荐系统及广告CTR预估等领域。
数理推导FM的原始的模型方程为:
y
^
(
x
)
:
=
w
0
+
∑
i
=
1
n
w
i
x
i
+
∑
i
=
1
n
∑
j
=
i
+
1
n
⟨
v
i
,
v
j
⟩
x
i
x
j
hat{y}(x):=w_0+sum^n_{i=1}w_ix_i+sum^n_{i=1}sum^n_{j=i+1}left langle v_i,v_j right rangle x_ix_j
y^(x):=w0+i=1∑nwixi+i=1∑nj=i+1∑n⟨vi,vj⟩xixj
这个式子的前两项就是一个简单的线性函数,这没什么好说的。接下来主要说一下最后这一项:
∑
i
=
1
n
∑
j
=
i
+
1
n
⟨
v
i
,
v
j
⟩
x
i
x
j
sum^n_{i=1}sum^n_{j=i+1}left langle v_i,v_j right rangle x_ix_j
i=1∑nj=i+1∑n⟨vi,vj⟩xixj
如果直接按照上面这个公式计算的话,复杂度就是
O
(
n
2
)
O(n^2)
O(n2)。可以对其进行矩阵分解,优化成复杂度为
O
(
k
n
)
O(kn)
O(kn) 的线性复杂度,推导过程如下:
∑
i
=
1
n
∑
j
=
i
+
1
n
⟨
v
i
,
v
j
⟩
x
i
x
j
=
1
2
∑
i
=
1
n
∑
j
=
1
n
⟨
v
i
,
v
j
⟩
x
i
x
j
−
1
2
∑
i
=
1
n
⟨
v
i
,
v
i
⟩
x
i
x
i
=
1
2
(
∑
i
=
1
n
∑
j
=
1
n
∑
f
=
1
k
v
i
f
v
j
f
x
i
x
j
−
∑
i
=
1
n
∑
f
=
1
k
v
i
f
v
i
f
x
i
x
i
)
=
1
2
∑
f
=
1
k
[
(
∑
i
=
1
n
v
i
f
x
i
)
(
∑
j
=
1
n
v
j
f
x
j
)
−
∑
i
=
1
n
v
i
f
2
x
i
2
]
=
1
2
∑
f
=
1
k
[
(
∑
i
=
1
n
v
i
f
x
i
)
2
−
∑
i
=
1
n
v
i
f
2
x
i
2
]
sum^n_{i=1}sum^n_{j=i+1}left langle v_i,v_j right rangle x_ix_j \ =frac{1}{2}sum^n_{i=1}sum^n_{j=1}left langle v_i,v_j right rangle x_ix_j-frac{1}{2}sum^n_{i=1} left langle v_i,v_i right rangle x_ix_i\ =frac{1}{2}(sum^n_{i=1}sum^n_{j=1}sum^k_{f=1}v_{if}v_{jf}x_ix_j-sum^n_{i=1}sum^k_{f=1}v_{if}v_{if}x_ix_i)\ =frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)(sum^n_{j=1}v_{jf}x_j)-sum^n_{i=1}v_{if}^2x_i^2right]\ =frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)^2-sum^n_{i=1}v_{if}^2x_i^2right]
i=1∑nj=i+1∑n⟨vi,vj⟩xixj=21i=1∑nj=1∑n⟨vi,vj⟩xixj−21i=1∑n⟨vi,vi⟩xixi=21(i=1∑nj=1∑nf=1∑kvifvjfxixj−i=1∑nf=1∑kvifvifxixi)=21f=1∑k[(i=1∑nvifxi)(j=1∑nvjfxj)−i=1∑nvif2xi2]=21f=1∑k[(i=1∑nvifxi)2−i=1∑nvif2xi2]
其中,
⟨
v
i
,
v
j
⟩
=
∑
f
=
1
k
v
i
f
v
j
f
left langle v_i,v_j right rangle=sum^k_{f=1}v_{if}v_{jf}
⟨vi,vj⟩=∑f=1kvifvjf
总的FM方程就是:
y
^
(
x
)
:
=
w
0
+
∑
i
=
1
n
w
i
x
i
+
1
2
∑
f
=
1
k
[
(
∑
i
=
1
n
v
i
f
x
i
)
2
−
∑
i
=
1
n
v
i
f
2
x
i
2
]
hat{y}(x):=w_0+sum^n_{i=1}w_ix_i + frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)^2-sum^n_{i=1}v_{if}^2x_i^2right]
y^(x):=w0+i=1∑nwixi+21f=1∑k[(i=1∑nvifxi)2−i=1∑nvif2xi2]
对于原始数据的特征是数值型的任务,可以直接使用上述公式,实现过程如下:
1、首先将上述式子改写成矩阵相乘的格式,方便后续编码实现,如下。
设输入的每一个样本可以表示成:
X
=
(
x
1
,
x
2
,
.
.
.
,
x
i
,
.
.
.
,
x
n
)
∈
R
1
×
n
X=(x_1,x_2,...,x_i,...,x_n) in mathbb{R}^{1times n}
X=(x1,x2,...,xi,...,xn)∈R1×n
设可学习的权重矩阵:
W
=
(
w
1
w
2
⋮
w
i
⋮
w
n
)
∈
R
n
×
1
W = begin{pmatrix} w_1\ w_2\ vdots\ w_i\ vdots\ w_n end{pmatrix} in mathbb{R}^{ntimes 1}
W=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎛w1w2⋮wi⋮wn⎠⎟⎟⎟⎟⎟⎟⎟⎟⎞∈Rn×1
和
V
=
(
v
11
v
12
⋯
v
1
f
⋯
v
1
k
v
21
v
22
⋯
v
2
f
⋯
v
2
k
⋮
⋮
⋱
⋮
⋱
⋮
v
i
1
v
i
2
⋯
v
i
f
⋯
v
i
k
⋮
⋮
⋱
⋮
⋱
⋮
v
n
1
v
n
2
⋯
v
n
f
⋯
v
n
k
)
∈
R
n
×
k
V = begin{pmatrix} v_{11}&v_{12}&cdots&v_{1f}&cdots&v_{1k}\ v_{21}&v_{22}&cdots&v_{2f}&cdots&v_{2k}\ vdots&vdots&ddots&vdots&ddots&vdots\ v_{i1}&v_{i2}&cdots&v_{if}&cdots&v_{ik}\ vdots&vdots&ddots&vdots&ddots&vdots\ v_{n1}&v_{n2}&cdots&v_{nf}&cdots&v_{nk} end{pmatrix} in mathbb{R}^{ntimes k}
V=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎛v11v21⋮vi1⋮vn1v12v22⋮vi2⋮vn2⋯⋯⋱⋯⋱⋯v1fv2f⋮vif⋮vnf⋯⋯⋱⋯⋱⋯v1kv2k⋮vik⋮vnk⎠⎟⎟⎟⎟⎟⎟⎟⎟⎞∈Rn×k
那么矩阵形式的FM模型方程就是:
y
^
(
x
)
:
=
w
0
+
∑
i
=
1
n
w
i
x
i
+
1
2
∑
f
=
1
k
[
(
∑
i
=
1
n
v
i
f
x
i
)
2
−
∑
i
=
1
n
v
i
f
2
x
i
2
]
=
w
0
+
X
W
+
1
2
∑
f
=
1
k
{
[
(
x
1
,
x
2
,
.
.
.
x
n
)
(
v
1
f
v
2
f
⋮
v
n
f
)
]
2
−
(
x
1
2
,
x
2
2
,
.
.
.
x
n
2
)
(
v
1
f
2
v
2
f
2
⋮
v
n
f
2
)
}
=
w
0
+
X
W
+
1
2
s
u
m
(
(
X
V
)
∘
(
X
V
)
−
(
X
∘
X
)
(
V
∘
V
)
,
a
x
i
s
=
1
)
hat{y}(x):=w_0+sum^n_{i=1}w_ix_i + frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)^2-sum^n_{i=1}v_{if}^2x_i^2right] \ = w_0 + XW + frac{1}{2}sum_{f=1}^k left { left[ begin{pmatrix} x_1,x_2,...x_n end{pmatrix} begin{pmatrix} v_{1f}\ v_{2f}\ vdots\ v_{nf} end{pmatrix} right]^2 - begin{pmatrix} x_1^2,x_2^2,...x_n^2 end{pmatrix} begin{pmatrix} v_{1f}^2\ v_{2f}^2\ vdots\ v_{nf}^2 end{pmatrix} right }\ \ = w_0 + XW + frac{1}{2}sum((XV) circ (XV)-(X circ X)(V circ V), axis=1)
y^(x):=w0+i=1∑nwixi+21f=1∑k[(i=1∑nvifxi)2−i=1∑nvif2xi2]=w0+XW+21f=1∑k⎩⎪⎪⎪⎨⎪⎪⎪⎧⎣⎢⎢⎢⎡(x1,x2,...xn)⎝⎜⎜⎜⎛v1fv2f⋮vnf⎠⎟⎟⎟⎞⎦⎥⎥⎥⎤2−(x12,x22,...xn2)⎝⎜⎜⎜⎛v1f2v2f2⋮vnf2⎠⎟⎟⎟⎞⎭⎪⎪⎪⎬⎪⎪⎪⎫=w0+XW+21sum((XV)∘(XV)−(X∘X)(V∘V),axis=1)
其中,
∘
circ
∘表示哈达玛积,即两个同阶矩阵对应元素相乘。
2、Pytorch代码实现:
import torch import torch.nn as nn class FactorizationMachine(nn.Module): def __init__(self, n, k): super(FactorizationMachine, self).__init__() self.n = n self.k = k self.linear = nn.Linear(self.n, 1, bias=True) self.v = nn.Parameter(torch.Tensor(self.k, self.n)) # 注:权重矩阵是(k,n)的,与公式里的相反,目的是下一步能在n的维度上分布初始化 nn.init.xavier_uniform_(self.v) def forward(self, x): """ :param x: Long tensor of size ``(b, n)`` :return: Long tensor of size ``(b, 1)`` """ x1 = self.linear(x) square_of_sum = torch.mm(x, self.v.T) * torch.mm(x, self.v.T) sum_of_square = torch.mm(x * x, self.v.T * self.v.T) x2 = 0.5 * torch.sum((square_of_sum - sum_of_square), dim=-1, keepdim=True) x = x1 + x2 return x稀疏型类值特征
1、类别型特征不好直接送入fm方程,需要先将其转换成稠密型嵌入向量。如下:
仍然设输入的每一个样本为:
X
=
(
x
1
,
x
2
,
.
.
.
,
x
i
,
.
.
.
,
x
n
)
∈
R
1
×
n
X=(x_1,x_2,...,x_i,...,x_n) in mathbb{R}^{1times n}
X=(x1,x2,...,xi,...,xn)∈R1×n
设嵌入函数为:
E
w
i
(
x
i
)
=
e
m
b
e
d
d
i
n
g
_
l
o
o
k
u
p
(
w
i
,
x
i
)
∈
R
1
×
1
则
:
E
W
(
X
)
∈
R
1
×
n
×
1
E_{w_i}(x_i)=embedding_lookup(w_i,x_i) in mathbb{R}^{1 times 1}\ 则:E_W(X) in mathbb{R}^{1 times n times 1}
Ewi(xi)=embedding_lookup(wi,xi)∈R1×1则:EW(X)∈R1×n×1
和
E
v
i
(
x
i
)
=
e
m
b
e
d
d
i
n
g
_
l
o
o
k
u
p
(
v
i
,
x
i
)
∈
R
1
×
e
m
b
_
d
i
m
则
:
E
V
(
X
)
∈
R
1
×
n
×
e
m
b
_
d
i
m
E_{v_i}(x_i)=embedding_lookup(v_i,x_i) in mathbb{R}^{1 times emb_dim}\ 则:E_V(X) in mathbb{R}^{1 times n times emb_dim}
Evi(xi)=embedding_lookup(vi,xi)∈R1×emb_dim则:EV(X)∈R1×n×emb_dim
其中,
x
i
∈
N
x_i in N
xi∈N 表示第
i
i
i个特征的类值,是自然数。
则FM方程可以表示为:
y
^
(
x
)
:
=
w
0
+
∑
i
=
1
n
E
w
i
(
x
i
)
+
1
2
∑
f
=
1
e
m
b
_
d
i
m
[
(
∑
i
=
1
n
E
v
i
f
(
x
i
)
)
2
−
∑
i
=
1
n
E
v
i
f
(
x
i
)
2
]
=
w
0
+
s
u
m
(
E
W
(
X
)
,
a
x
i
s
=
1
)
+
1
2
s
u
m
(
s
u
m
(
E
V
(
X
)
,
a
x
i
s
=
1
)
∘
s
u
m
(
E
V
(
X
)
,
a
x
i
s
=
1
)
−
s
u
m
(
E
V
(
X
)
∘
E
V
(
X
)
,
a
x
i
s
=
1
)
,
a
x
i
s
=
1
)
hat{y}(x):=w_0+sum^n_{i=1}E_{w_i}(x_i)+frac{1}{2}sum^{emb_dim}_{f=1}left[(sum^n_{i=1}E_{v_{if}}(x_i))^2-sum^n_{i=1}E_{v_{if}}(x_i)^2right]\ =w_0 + sum(E_W(X), axis=1) + \ frac{1}{2}sum(sum(E_V(X), axis=1) circ sum(E_V(X), axis=1) - sum(E_V(X) circ E_V(X), axis=1), axis=1)
y^(x):=w0+i=1∑nEwi(xi)+21f=1∑emb_dim[(i=1∑nEvif(xi))2−i=1∑nEvif(xi)2]=w0+sum(EW(X),axis=1)+21sum(sum(EV(X),axis=1)∘sum(EV(X),axis=1)−sum(EV(X)∘EV(X),axis=1),axis=1)
2、Pytorch实现:
源代码来自https://github.com/rixwew/pytorch-fm/blob/master/torchfm/model/fm.py,这里整合了一下:
import torch class FeaturesLinear(torch.nn.Module): def __init__(self, field_dims, output_dim=1): super().__init__() self.fc = torch.nn.Embedding(sum(field_dims), output_dim) self.bias = torch.nn.Parameter(torch.zeros((output_dim,))) self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long) def forward(self, x): """ :param x: Long tensor of size ``(batch_size, num_fields)`` """ x = x + x.new_tensor(self.offsets).unsqueeze(0) return torch.sum(self.fc(x), dim=1) + self.bias class FeaturesEmbedding(torch.nn.Module): def __init__(self, field_dims, embed_dim): super().__init__() self.embedding = torch.nn.Embedding(sum(field_dims), embed_dim) self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long) torch.nn.init.xavier_uniform_(self.embedding.weight.data) def forward(self, x): """ :param x: Long tensor of size ``(batch_size, num_fields)`` """ x = x + x.new_tensor(self.offsets).unsqueeze(0) return self.embedding(x) class FactorizationMachine(torch.nn.Module): def __init__(self, reduce_sum=True): super().__init__() self.reduce_sum = reduce_sum def forward(self, x): """ :param x: Float tensor of size ``(batch_size, num_fields, embed_dim)`` """ square_of_sum = torch.sum(x, dim=1) ** 2 sum_of_square = torch.sum(x ** 2, dim=1) ix = square_of_sum - sum_of_square if self.reduce_sum: ix = torch.sum(ix, dim=1, keepdim=True) return 0.5 * ix # 接口 class FactorizationMachineModel(torch.nn.Module): """ A pytorch implementation of Factorization Machine. Reference: S Rendle, Factorization Machines, 2010. """ def __init__(self, field_dims, embed_dim): super().__init__() self.embedding = FeaturesEmbedding(field_dims, embed_dim) self.linear = FeaturesLinear(field_dims) self.fm = FactorizationMachine(reduce_sum=True) def forward(self, x): """ :param x: Long tensor of size ``(batch_size, num_fields)`` """ x = self.linear(x) + self.fm(self.embedding(x)) return torch.sigmoid(x.squeeze(1))
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)