【Pytorch】FM推导及其实现

【Pytorch】FM推导及其实现,第1张

【Pytorch】FM推导及其实现

因子分解机(Factorization Machine, FM, 2010年)是由Steffen Rendle提出的一种基于矩阵分解的机器学习算法。最大的特点是易于整合交叉特征、可以处理高度稀疏数据,主要应用在推荐系统及广告CTR预估等领域。

数理推导

FM的原始的模型方程为:
y ^ ( x ) : = w 0 + ∑ i = 1 n w i x i + ∑ i = 1 n ∑ j = i + 1 n ⟨ v i , v j ⟩ x i x j hat{y}(x):=w_0+sum^n_{i=1}w_ix_i+sum^n_{i=1}sum^n_{j=i+1}left langle v_i,v_j right rangle x_ix_j y^​(x):=w0​+i=1∑n​wi​xi​+i=1∑n​j=i+1∑n​⟨vi​,vj​⟩xi​xj​
这个式子的前两项就是一个简单的线性函数,这没什么好说的。接下来主要说一下最后这一项:
∑ i = 1 n ∑ j = i + 1 n ⟨ v i , v j ⟩ x i x j sum^n_{i=1}sum^n_{j=i+1}left langle v_i,v_j right rangle x_ix_j i=1∑n​j=i+1∑n​⟨vi​,vj​⟩xi​xj​
如果直接按照上面这个公式计算的话,复杂度就是 O ( n 2 ) O(n^2) O(n2)。可以对其进行矩阵分解,优化成复杂度为 O ( k n ) O(kn) O(kn) 的线性复杂度,推导过程如下:
∑ i = 1 n ∑ j = i + 1 n ⟨ v i , v j ⟩ x i x j = 1 2 ∑ i = 1 n ∑ j = 1 n ⟨ v i , v j ⟩ x i x j − 1 2 ∑ i = 1 n ⟨ v i , v i ⟩ x i x i = 1 2 ( ∑ i = 1 n ∑ j = 1 n ∑ f = 1 k v i f v j f x i x j − ∑ i = 1 n ∑ f = 1 k v i f v i f x i x i ) = 1 2 ∑ f = 1 k [ ( ∑ i = 1 n v i f x i ) ( ∑ j = 1 n v j f x j ) − ∑ i = 1 n v i f 2 x i 2 ] = 1 2 ∑ f = 1 k [ ( ∑ i = 1 n v i f x i ) 2 − ∑ i = 1 n v i f 2 x i 2 ] sum^n_{i=1}sum^n_{j=i+1}left langle v_i,v_j right rangle x_ix_j \ =frac{1}{2}sum^n_{i=1}sum^n_{j=1}left langle v_i,v_j right rangle x_ix_j-frac{1}{2}sum^n_{i=1} left langle v_i,v_i right rangle x_ix_i\ =frac{1}{2}(sum^n_{i=1}sum^n_{j=1}sum^k_{f=1}v_{if}v_{jf}x_ix_j-sum^n_{i=1}sum^k_{f=1}v_{if}v_{if}x_ix_i)\ =frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)(sum^n_{j=1}v_{jf}x_j)-sum^n_{i=1}v_{if}^2x_i^2right]\ =frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)^2-sum^n_{i=1}v_{if}^2x_i^2right] i=1∑n​j=i+1∑n​⟨vi​,vj​⟩xi​xj​=21​i=1∑n​j=1∑n​⟨vi​,vj​⟩xi​xj​−21​i=1∑n​⟨vi​,vi​⟩xi​xi​=21​(i=1∑n​j=1∑n​f=1∑k​vif​vjf​xi​xj​−i=1∑n​f=1∑k​vif​vif​xi​xi​)=21​f=1∑k​[(i=1∑n​vif​xi​)(j=1∑n​vjf​xj​)−i=1∑n​vif2​xi2​]=21​f=1∑k​[(i=1∑n​vif​xi​)2−i=1∑n​vif2​xi2​]
其中, ⟨ v i , v j ⟩ = ∑ f = 1 k v i f v j f left langle v_i,v_j right rangle=sum^k_{f=1}v_{if}v_{jf} ⟨vi​,vj​⟩=∑f=1k​vif​vjf​

总的FM方程就是:
y ^ ( x ) : = w 0 + ∑ i = 1 n w i x i + 1 2 ∑ f = 1 k [ ( ∑ i = 1 n v i f x i ) 2 − ∑ i = 1 n v i f 2 x i 2 ] hat{y}(x):=w_0+sum^n_{i=1}w_ix_i + frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)^2-sum^n_{i=1}v_{if}^2x_i^2right] y^​(x):=w0​+i=1∑n​wi​xi​+21​f=1∑k​[(i=1∑n​vif​xi​)2−i=1∑n​vif2​xi2​]

实现过程 稠密型数值特征

对于原始数据的特征是数值型的任务,可以直接使用上述公式,实现过程如下:

1、首先将上述式子改写成矩阵相乘的格式,方便后续编码实现,如下。

设输入的每一个样本可以表示成:
X = ( x 1 , x 2 , . . . , x i , . . . , x n ) ∈ R 1 × n X=(x_1,x_2,...,x_i,...,x_n) in mathbb{R}^{1times n} X=(x1​,x2​,...,xi​,...,xn​)∈R1×n
设可学习的权重矩阵:
W = ( w 1 w 2 ⋮ w i ⋮ w n ) ∈ R n × 1 W = begin{pmatrix} w_1\ w_2\ vdots\ w_i\ vdots\ w_n end{pmatrix} in mathbb{R}^{ntimes 1} W=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎛​w1​w2​⋮wi​⋮wn​​⎠⎟⎟⎟⎟⎟⎟⎟⎟⎞​∈Rn×1

V = ( v 11 v 12 ⋯ v 1 f ⋯ v 1 k v 21 v 22 ⋯ v 2 f ⋯ v 2 k ⋮ ⋮ ⋱ ⋮ ⋱ ⋮ v i 1 v i 2 ⋯ v i f ⋯ v i k ⋮ ⋮ ⋱ ⋮ ⋱ ⋮ v n 1 v n 2 ⋯ v n f ⋯ v n k ) ∈ R n × k V = begin{pmatrix} v_{11}&v_{12}&cdots&v_{1f}&cdots&v_{1k}\ v_{21}&v_{22}&cdots&v_{2f}&cdots&v_{2k}\ vdots&vdots&ddots&vdots&ddots&vdots\ v_{i1}&v_{i2}&cdots&v_{if}&cdots&v_{ik}\ vdots&vdots&ddots&vdots&ddots&vdots\ v_{n1}&v_{n2}&cdots&v_{nf}&cdots&v_{nk} end{pmatrix} in mathbb{R}^{ntimes k} V=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎛​v11​v21​⋮vi1​⋮vn1​​v12​v22​⋮vi2​⋮vn2​​⋯⋯⋱⋯⋱⋯​v1f​v2f​⋮vif​⋮vnf​​⋯⋯⋱⋯⋱⋯​v1k​v2k​⋮vik​⋮vnk​​⎠⎟⎟⎟⎟⎟⎟⎟⎟⎞​∈Rn×k
那么矩阵形式的FM模型方程就是:
y ^ ( x ) : = w 0 + ∑ i = 1 n w i x i + 1 2 ∑ f = 1 k [ ( ∑ i = 1 n v i f x i ) 2 − ∑ i = 1 n v i f 2 x i 2 ] = w 0 + X W + 1 2 ∑ f = 1 k { [ ( x 1 , x 2 , . . . x n ) ( v 1 f v 2 f ⋮ v n f ) ] 2 − ( x 1 2 , x 2 2 , . . . x n 2 ) ( v 1 f 2 v 2 f 2 ⋮ v n f 2 ) } = w 0 + X W + 1 2 s u m ( ( X V ) ∘ ( X V ) − ( X ∘ X ) ( V ∘ V ) , a x i s = 1 ) hat{y}(x):=w_0+sum^n_{i=1}w_ix_i + frac{1}{2}sum^k_{f=1}left[(sum^n_{i=1}v_{if}x_i)^2-sum^n_{i=1}v_{if}^2x_i^2right] \ = w_0 + XW + frac{1}{2}sum_{f=1}^k left { left[ begin{pmatrix} x_1,x_2,...x_n end{pmatrix} begin{pmatrix} v_{1f}\ v_{2f}\ vdots\ v_{nf} end{pmatrix} right]^2 - begin{pmatrix} x_1^2,x_2^2,...x_n^2 end{pmatrix} begin{pmatrix} v_{1f}^2\ v_{2f}^2\ vdots\ v_{nf}^2 end{pmatrix} right }\ \ = w_0 + XW + frac{1}{2}sum((XV) circ (XV)-(X circ X)(V circ V), axis=1) y^​(x):=w0​+i=1∑n​wi​xi​+21​f=1∑k​[(i=1∑n​vif​xi​)2−i=1∑n​vif2​xi2​]=w0​+XW+21​f=1∑k​⎩⎪⎪⎪⎨⎪⎪⎪⎧​⎣⎢⎢⎢⎡​(x1​,x2​,...xn​​)⎝⎜⎜⎜⎛​v1f​v2f​⋮vnf​​⎠⎟⎟⎟⎞​⎦⎥⎥⎥⎤​2−(x12​,x22​,...xn2​​)⎝⎜⎜⎜⎛​v1f2​v2f2​⋮vnf2​​⎠⎟⎟⎟⎞​⎭⎪⎪⎪⎬⎪⎪⎪⎫​=w0​+XW+21​sum((XV)∘(XV)−(X∘X)(V∘V),axis=1)
其中, ∘ circ ∘表示哈达玛积,即两个同阶矩阵对应元素相乘。

2、Pytorch代码实现:

import torch
import torch.nn as nn


class FactorizationMachine(nn.Module):
    def __init__(self, n, k):
        super(FactorizationMachine, self).__init__()
        self.n = n
        self.k = k
        self.linear = nn.Linear(self.n, 1, bias=True)
        self.v = nn.Parameter(torch.Tensor(self.k, self.n))  # 注:权重矩阵是(k,n)的,与公式里的相反,目的是下一步能在n的维度上分布初始化
        nn.init.xavier_uniform_(self.v)

    def forward(self, x):
        """
        :param x: Long tensor of size ``(b, n)``
        :return: Long tensor of size ``(b, 1)``
        """
        x1 = self.linear(x)
        square_of_sum = torch.mm(x, self.v.T) * torch.mm(x, self.v.T)
        sum_of_square = torch.mm(x * x, self.v.T * self.v.T)
        x2 = 0.5 * torch.sum((square_of_sum - sum_of_square), dim=-1, keepdim=True)
        x = x1 + x2
        return x
稀疏型类值特征

1、类别型特征不好直接送入fm方程,需要先将其转换成稠密型嵌入向量。如下:

仍然设输入的每一个样本为:
X = ( x 1 , x 2 , . . . , x i , . . . , x n ) ∈ R 1 × n X=(x_1,x_2,...,x_i,...,x_n) in mathbb{R}^{1times n} X=(x1​,x2​,...,xi​,...,xn​)∈R1×n
设嵌入函数为:
E w i ( x i ) = e m b e d d i n g _ l o o k u p ( w i , x i ) ∈ R 1 × 1 则 : E W ( X ) ∈ R 1 × n × 1 E_{w_i}(x_i)=embedding_lookup(w_i,x_i) in mathbb{R}^{1 times 1}\ 则:E_W(X) in mathbb{R}^{1 times n times 1} Ewi​​(xi​)=embedding_lookup(wi​,xi​)∈R1×1则:EW​(X)∈R1×n×1

E v i ( x i ) = e m b e d d i n g _ l o o k u p ( v i , x i ) ∈ R 1 × e m b _ d i m 则 : E V ( X ) ∈ R 1 × n × e m b _ d i m E_{v_i}(x_i)=embedding_lookup(v_i,x_i) in mathbb{R}^{1 times emb_dim}\ 则:E_V(X) in mathbb{R}^{1 times n times emb_dim} Evi​​(xi​)=embedding_lookup(vi​,xi​)∈R1×emb_dim则:EV​(X)∈R1×n×emb_dim
其中, x i ∈ N x_i in N xi​∈N 表示第 i i i个特征的类值,是自然数。

则FM方程可以表示为:
y ^ ( x ) : = w 0 + ∑ i = 1 n E w i ( x i ) + 1 2 ∑ f = 1 e m b _ d i m [ ( ∑ i = 1 n E v i f ( x i ) ) 2 − ∑ i = 1 n E v i f ( x i ) 2 ] = w 0 + s u m ( E W ( X ) , a x i s = 1 ) + 1 2 s u m ( s u m ( E V ( X ) , a x i s = 1 ) ∘ s u m ( E V ( X ) , a x i s = 1 ) − s u m ( E V ( X ) ∘ E V ( X ) , a x i s = 1 ) , a x i s = 1 ) hat{y}(x):=w_0+sum^n_{i=1}E_{w_i}(x_i)+frac{1}{2}sum^{emb_dim}_{f=1}left[(sum^n_{i=1}E_{v_{if}}(x_i))^2-sum^n_{i=1}E_{v_{if}}(x_i)^2right]\ =w_0 + sum(E_W(X), axis=1) + \ frac{1}{2}sum(sum(E_V(X), axis=1) circ sum(E_V(X), axis=1) - sum(E_V(X) circ E_V(X), axis=1), axis=1) y^​(x):=w0​+i=1∑n​Ewi​​(xi​)+21​f=1∑emb_dim​[(i=1∑n​Evif​​(xi​))2−i=1∑n​Evif​​(xi​)2]=w0​+sum(EW​(X),axis=1)+21​sum(sum(EV​(X),axis=1)∘sum(EV​(X),axis=1)−sum(EV​(X)∘EV​(X),axis=1),axis=1)
2、Pytorch实现:

源代码来自https://github.com/rixwew/pytorch-fm/blob/master/torchfm/model/fm.py,这里整合了一下:

import torch


class FeaturesLinear(torch.nn.Module):
    def __init__(self, field_dims, output_dim=1):
        super().__init__()
        self.fc = torch.nn.Embedding(sum(field_dims), output_dim)
        self.bias = torch.nn.Parameter(torch.zeros((output_dim,)))
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)

    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return torch.sum(self.fc(x), dim=1) + self.bias
    
    
class FeaturesEmbedding(torch.nn.Module):
    def __init__(self, field_dims, embed_dim):
        super().__init__()
        self.embedding = torch.nn.Embedding(sum(field_dims), embed_dim)
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)
        torch.nn.init.xavier_uniform_(self.embedding.weight.data)

    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return self.embedding(x)
    
    
class FactorizationMachine(torch.nn.Module):
    def __init__(self, reduce_sum=True):
        super().__init__()
        self.reduce_sum = reduce_sum

    def forward(self, x):
        """
        :param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``
        """
        square_of_sum = torch.sum(x, dim=1) ** 2
        sum_of_square = torch.sum(x ** 2, dim=1)
        ix = square_of_sum - sum_of_square
        if self.reduce_sum:
            ix = torch.sum(ix, dim=1, keepdim=True)
        return 0.5 * ix
    
    
# 接口
class FactorizationMachineModel(torch.nn.Module):
    """
    A pytorch implementation of Factorization Machine.

    Reference:
        S Rendle, Factorization Machines, 2010.
    """

    def __init__(self, field_dims, embed_dim):
        super().__init__()
        self.embedding = FeaturesEmbedding(field_dims, embed_dim)
        self.linear = FeaturesLinear(field_dims)
        self.fm = FactorizationMachine(reduce_sum=True)

    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = self.linear(x) + self.fm(self.embedding(x))
        return torch.sigmoid(x.squeeze(1))

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5495749.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-12
下一篇 2022-12-13

发表评论

登录后才能评论

评论列表(0条)

保存