标签:一个 eps VID ros rop 整合 github infer 数量级
来自中科大的文章,发表在AAAI2020上。
提出了一种利用空间和时间上的多种信息相结合的端到端学习框架,RWTH-v2 WER: 21.1,RWTH-v3 WER: 19.6,达到了最新的State-Of-The-Art。
主要创新点在于:1.空间建模。利用姿态估计预测人体7对关键点,再基于关键点为中心,剪裁出与手语强相关的左右手的patch,人脸patch等的信息,作为多种信息来源进行特征提取,最后输出人脸,人手,全图,姿态4种来源的特征向量。2.时间建模。时间建模有两条路径,一条是多信号间的时间建模,另一条是多信号内的时间建模,充分挖掘了各信号的时间信息。
此外沿用了前些年文章里提出的staged optimization strategy生成伪标签进行迭代优化。
作者提出的TMC Module旨在从inter-cue和intra-cue两个方面整合时空信息,而不是简单的信息融合。
The intra-cue path captures the unique features of each visual cue.
The inter-cue path learns the combination of fused features from different cues at different time scales.
The first path is to provide unique fea- tures of different cues at different time scales.
The second path is to perform the temporal transformation on the inter-cue feature from the previous block and fuse information from the intra-cue path as follows.
在每个Block之后,有TP为kernel size = 2, stride = 2的Temporal max-pooling运算。
在训练过程中,作者将Inter-cue path作为主要优化目标。为了提供每个单独信息特征的融合,Intra-cue path作起到辅助作用。因此,整个STMC框架的目标函数如下:
For inference, we pass video frames through the SMC and TMC modules. Only the inter-cue feature sequence and its BLSTM encoder are used to generate the pos- terior probability distribution of glosses at all time steps. We use the beam search decoder (Hannun et al. 2014) to search the most probable sequence within an acceptable range.(the beam width is set to 20)
为了获得关键点位置用于训练,作用使用了开源的HRNet工具去估计文中所述上半身7个关键点。
Adam, lr = 5e-5, batch_siz e= 2, \(\alpha\) = 0.6, \(\beta\) = 30
Staged optimization strategy:
First, we train a VGG11-based network as DNF (Cui, Liu, and Zhang 2019) and use it to decode pseudo labels for each clip. Then, we add a fully-connected layer after each output of the TMC module. The STMC network without BLSTM is trained with cross-entropy and smooth-L1 loss by SGD optimizer. The batch size is 24 and the clip size is 16. Finally, with fine- tuned parameters from the previous stage, our full STMC network is trained end-to-end under joint loss optimization.
标签:一个 eps VID ros rop 整合 github infer 数量级
原文地址:https://www.cnblogs.com/august-en/p/12355146.html