标签:rtc 接受 answer with product com modal 结构 embed
当模型需要接受多个模态的数据时,往往需要设计合适的方法让他们能进行信息的融合,Joint embedding是一种较为普遍的思路,即将他们映射到同一个向量空间中,再进行融合。
[1] Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
[2] Hadamard product for low-rank bilinear pooling
[3] Multi-modal factorized bilinear pooling with co-attention learning for visual question answering
[4] Multimodal residual learning for visual qa
标签:rtc 接受 answer with product com modal 结构 embed
原文地址:https://www.cnblogs.com/LukeStepByStep/p/11209317.html