Audio-visual speaker recognition via multi-modal correlated neural networks

doi:10.1109/WIW.2016.47

Details of Research Outputs

Title	Audio-visual speaker recognition via multi-modal correlated neural networks
Creator	Geng, Jiajia 1; Liu, Xin 1; Cheung, Yiu Ming 2,3
Date Issued	2017-01-11
Conference Name	IEEE/WIC/ACM International Conference on Web Intelligence (WI)
Source Publication	Proceedings - 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, WIW 2016
Pages	123-128
Conference Date	OCT 13-16, 2016
Conference Place	Omaha
Abstract	Multi-modal speaker recognition has received a lot of attention in recent years due to the growing security demands in real applications. In this paper, we present an efficient audiovisual speaker recognition method by fusing face and audio via the multi-modal correlated neural networks. Within our proposed approach, the facial features learned by convolutional neural networks are compatible with audio features at high-level and the heterogeneous multi-modal features can be learned automatically. Accordingly, we propose a correlated neural networks to fuse the face and audio modalities at different level such that the speaker identity can be well identified. The experimental results have shown that our proposed multi-modal speaker recognition approach can produce better performance than single modality, and the feature-level fusion yields comparative and even better results than the decision-level case.
DOI	10.1109/WIW.2016.47
URL	View source
Indexed By	CPCI-S
Language	英语English
WOS Research Area	Computer Science
WOS Subject	Computer Science ; Artificial Intelligence ; Computer Science, Information Systems
WOS ID	WOS:000404435600031
Scopus ID	2-s2.0-85013648143
Citation statistics	Cited Times:7[WOS] [WOS Record] [Related Records in WOS]
Document Type	Conference paper
Identifier	http://repository.uic.edu.cn/handle/39GCC9TT/6373
Collection	Beijing Normal-Hong Kong Baptist University
Corresponding Author	Liu, Xin
Affiliation	1.Department of Computer Science and Technology,Huaqiao University,Xiamen,China 2.Department of Computer Science,Hong Kong Baptist University,Hong Kong 3.United International College,BNU - HKBU,Zhuhai,China
Recommended Citation GB/T 7714	Geng, Jiajia,Liu, Xin,Cheung, Yiu Ming. Audio-visual speaker recognition via multi-modal correlated neural networks[C], 2017: 123-128.

Files in This Item:
There are no files associated with this item.

Related Services

Usage statistics

Google Scholar

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Institution:
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh