Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval

doi:10.1145/3404835.3463031

科研成果详情

题名	Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval
作者	He, Yi 1; Liu, Xin 2; Cheung, Yiu Ming 3; Peng, Shujuan 4; Yi, Jinhan 2; Fan, Wentao4
发表日期	2021-07-11
会议名称	44th International ACM SIGIR Conference on Research and Development in Information Retrieval
会议录名称	SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
页码	1865-1869
会议日期	JUL 11-15, 2021
会议地点	ELECTR NETWORK
摘要	Fine-grained Image-text retrieval is challenging but vital technology in the field of multimedia analysis. Existing methods mainly focus on learning the common embedding space of images (or patches) and sentences (or words), whereby their mapping features in such embedding space can be directly measured. Nevertheless, most existing image-text retrieval works rarely consider the shared semantic concepts that potentially correlated the heterogeneous modalities, which can enhance the discriminative power of learning such embedding space. Toward this end, we propose a Cross-Graph Attention model (CGAM) to explicitly learn the shared semantic concepts, which can be well utilized to guide the feature learning process of each modality and promote the common embedding learning. More specifically, we build semantic-embedded graph for each modality, and smooth the discrepancy between two modalities via cross-graph attention model to obtain shared semantic-enhanced features. Meanwhile, we reconstruct image and text features via the shared semantic concepts and original embedding representations, and leverage multi-head mechanism for similarity calculation. Accordingly, the semantic-enhanced cross-modal embedding between image and text is discriminatively obtained to benefit the fine-grained retrieval with high retrieval performance. Extensive experiments evaluated on benchmark datasets show the performance improvements in comparison with state-of-the-arts.
关键词	cross-graph attention image-text retrieval multi-head mechanism shared cemantic concept
DOI	10.1145/3404835.3463031
URL	查看来源
收录类别	CPCI-S
语种	英语English
WOS研究方向	Computer Science
WOS类目	Computer Science, Information Systems
WOS记录号	WOS:000719807900208
Scopus入藏号	2-s2.0-85111661437
引用统计	被引频次：26[WOS] [WOS记录] [WOS相关记录]
文献类型	会议论文
条目标识符	https://repository.uic.edu.cn/handle/39GCC9TT/13030
专题	个人在本单位外知识产出理工科技学院
作者单位	1.Dep. of Cs,Huaqiao University Provincial Key Lab. for Comput. Inf. Process. Technol.,Soochow Univ.,Xiamen,Suzhou,China 2.Dep. of Cs,Huaqiao University,Fujian Key Lab. of Big Data Intelligence and Security,Xiamen,China 3.Department of Computer Science,Hong Kong Baptist University,Hong Kong,Hong Kong 4.Dep. of Cs,Huaqiao University,Xiamen Key Lab. of Computer Vision and Pattern Recognition,Xiamen,China
推荐引用方式 GB/T 7714	He, Yi,Liu, Xin,Cheung, Yiu Minget al. Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval[C], 2021: 1865-1869.