科研成果详情

题名Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval
作者
发表日期2021-07-11
会议名称44th International ACM SIGIR Conference on Research and Development in Information Retrieval
会议录名称SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
页码1865-1869
会议日期JUL 11-15, 2021
会议地点ELECTR NETWORK
摘要

Fine-grained Image-text retrieval is challenging but vital technology in the field of multimedia analysis. Existing methods mainly focus on learning the common embedding space of images (or patches) and sentences (or words), whereby their mapping features in such embedding space can be directly measured. Nevertheless, most existing image-text retrieval works rarely consider the shared semantic concepts that potentially correlated the heterogeneous modalities, which can enhance the discriminative power of learning such embedding space. Toward this end, we propose a Cross-Graph Attention model (CGAM) to explicitly learn the shared semantic concepts, which can be well utilized to guide the feature learning process of each modality and promote the common embedding learning. More specifically, we build semantic-embedded graph for each modality, and smooth the discrepancy between two modalities via cross-graph attention model to obtain shared semantic-enhanced features. Meanwhile, we reconstruct image and text features via the shared semantic concepts and original embedding representations, and leverage multi-head mechanism for similarity calculation. Accordingly, the semantic-enhanced cross-modal embedding between image and text is discriminatively obtained to benefit the fine-grained retrieval with high retrieval performance. Extensive experiments evaluated on benchmark datasets show the performance improvements in comparison with state-of-the-arts.

关键词cross-graph attention image-text retrieval multi-head mechanism shared cemantic concept
DOI10.1145/3404835.3463031
URL查看来源
收录类别CPCI-S
语种英语English
WOS研究方向Computer Science
WOS类目Computer Science, Information Systems
WOS记录号WOS:000719807900208
Scopus入藏号2-s2.0-85111661437
引用统计
被引频次:26[WOS]   [WOS记录]     [WOS相关记录]
文献类型会议论文
条目标识符https://repository.uic.edu.cn/handle/39GCC9TT/13030
专题个人在本单位外知识产出
理工科技学院
作者单位
1.Dep. of Cs,Huaqiao University Provincial Key Lab. for Comput. Inf. Process. Technol.,Soochow Univ.,Xiamen,Suzhou,China
2.Dep. of Cs,Huaqiao University,Fujian Key Lab. of Big Data Intelligence and Security,Xiamen,China
3.Department of Computer Science,Hong Kong Baptist University,Hong Kong,Hong Kong
4.Dep. of Cs,Huaqiao University,Xiamen Key Lab. of Computer Vision and Pattern Recognition,Xiamen,China
推荐引用方式
GB/T 7714
He, Yi,Liu, Xin,Cheung, Yiu Minget al. Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval[C], 2021: 1865-1869.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[He, Yi]的文章
[Liu, Xin]的文章
[Cheung, Yiu Ming]的文章
百度学术
百度学术中相似的文章
[He, Yi]的文章
[Liu, Xin]的文章
[Cheung, Yiu Ming]的文章
必应学术
必应学术中相似的文章
[He, Yi]的文章
[Liu, Xin]的文章
[Cheung, Yiu Ming]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。