AFM论文精读

时间：2019-04-27 19:31:02 阅读：189 评论：0 收藏：0 [点我收藏+]

标签：数据集 ict attention pac parameter cte layer xtend sum

算法推导

\[ \text{(非零)特征集:}\chi \]

\[ \text{(非零)特征的embeding输出:}\varepsilon = \left \{ v_ix_i \right \}_{i\in \chi } \]

FM模型数学公式：
\[ \widehat{y}_{FM}(X)=W_0+\sum_{i=1}^n w_ix_i+\sum_{i=1}^n \sum_{j=i+1}^n \widehat{w}_{ij}x_ix_j \text{(1)} \]

pair-wise interaction layer(It expands m vectors to m(m ? 1)/2 interacted vectors):
\[ f_{PI}(\varepsilon )=\left \{ v_i \odot v_jx_ix_j \right \}_{i,j \in R_x }\text{(2)} \]

\[ \text{这里}R_x=\left \{ (i,j) \right \}_{i \in \chi ,j \in \chi,j>i } \]

the attention network is defined as ：
\[ \acute{a_{ij}}=h^TReLU(W(v_i \odot v_j)x_ix_j+b),a_{ij}= \frac{exp(\acute{a_{ij}})}{\displaystyle \sum_{(i,j) \in R_x}exp(\acute{a_{ij}})}(5) \]

\[ \text{这里}w \in R^{t*k},b \in R^t,h \in R^t,\text{t代表注意力网络隐藏层大小,k是注意力网络输出向量维度大小} \]

综上得AFM模型公式：

\[ ?_{AFM}(x)=ω_0+∑_{i=1}^{n}ω_{i}x_{i}+p^T∑^{n}_{i=1}∑^{n}_{j=i+1}a_{ij}(v_i⊙v_j)x_ix_j \]

模型用到得参数集合：
\[ \Theta =\left \{ w_0, \left \{ w_i \right \}_{i=1}^n,\left \{ v_i \right \}_{i=1}^n ,P,W,b,h \right \} \]

论文要点

We point out that in these methods(e.g WDL,DCN), feature interactions are implicitly captured by a deep neural network, rather than FM that explicitly models each interaction as the inner product of two features. As such, these deep methods are not interpretable, as the contribution of each feature interaction is unknown.By directly extending FM with the attention mechanism that learns the importance of each feature interaction, our AMF is more interpretable and empirically demonstrates superior performance over Wide&Deep and DeepCross.
RQ1 How do the key hyper-parameters of AFM (i.e., dropout on feature interactions and regularization on the attention network) impact its performance?
分别在开源数据机调参Dropout率和L2正则系数
RQ2 Can the attention network effectively learn the importance of feature interactions?
对比只训练embeding和只训练attention network
RQ3 How does AFM perform as compared to the state-of-theart methods for sparse data prediction?
对比开源数据集上的参数个数与损失;参数更少,损失更低

AFM论文精读

标签：数据集 ict attention pac parameter cte layer xtend sum

原文地址：https://www.cnblogs.com/arachis/p/AFM_detail.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行