题名 | Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval |
作者 | |
发表日期 | 2021-07-11 |
会议名称 | 44th International ACM SIGIR Conference on Research and Development in Information Retrieval |
会议录名称 | SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
![]() |
页码 | 1865-1869 |
会议日期 | JUL 11-15, 2021 |
会议地点 | ELECTR NETWORK |
摘要 | Fine-grained Image-text retrieval is challenging but vital technology in the field of multimedia analysis. Existing methods mainly focus on learning the common embedding space of images (or patches) and sentences (or words), whereby their mapping features in such embedding space can be directly measured. Nevertheless, most existing image-text retrieval works rarely consider the shared semantic concepts that potentially correlated the heterogeneous modalities, which can enhance the discriminative power of learning such embedding space. Toward this end, we propose a Cross-Graph Attention model (CGAM) to explicitly learn the shared semantic concepts, which can be well utilized to guide the feature learning process of each modality and promote the common embedding learning. More specifically, we build semantic-embedded graph for each modality, and smooth the discrepancy between two modalities via cross-graph attention model to obtain shared semantic-enhanced features. Meanwhile, we reconstruct image and text features via the shared semantic concepts and original embedding representations, and leverage multi-head mechanism for similarity calculation. Accordingly, the semantic-enhanced cross-modal embedding between image and text is discriminatively obtained to benefit the fine-grained retrieval with high retrieval performance. Extensive experiments evaluated on benchmark datasets show the performance improvements in comparison with state-of-the-arts. |
关键词 | cross-graph attention image-text retrieval multi-head mechanism shared cemantic concept |
DOI | 10.1145/3404835.3463031 |
URL | 查看来源 |
收录类别 | CPCI-S |
语种 | 英语English |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Information Systems |
WOS记录号 | WOS:000719807900208 |
Scopus入藏号 | 2-s2.0-85111661437 |
引用统计 | |
文献类型 | 会议论文 |
条目标识符 | https://repository.uic.edu.cn/handle/39GCC9TT/13030 |
专题 | 个人在本单位外知识产出 理工科技学院 |
作者单位 | 1.Dep. of Cs,Huaqiao University Provincial Key Lab. for Comput. Inf. Process. Technol.,Soochow Univ.,Xiamen,Suzhou,China 2.Dep. of Cs,Huaqiao University,Fujian Key Lab. of Big Data Intelligence and Security,Xiamen,China 3.Department of Computer Science,Hong Kong Baptist University,Hong Kong,Hong Kong 4.Dep. of Cs,Huaqiao University,Xiamen Key Lab. of Computer Vision and Pattern Recognition,Xiamen,China |
推荐引用方式 GB/T 7714 | He, Yi,Liu, Xin,Cheung, Yiu Minget al. Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval[C], 2021: 1865-1869. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论