Details of Research Outputs

TitleAudio-visual speaker recognition via multi-modal correlated neural networks
Creator
Date Issued2017-01-11
Conference NameIEEE/WIC/ACM International Conference on Web Intelligence (WI)
Source PublicationProceedings - 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, WIW 2016
Pages123-128
Conference DateOCT 13-16, 2016
Conference PlaceOmaha
Abstract

Multi-modal speaker recognition has received a lot of attention in recent years due to the growing security demands in real applications. In this paper, we present an efficient audiovisual speaker recognition method by fusing face and audio via the multi-modal correlated neural networks. Within our proposed approach, the facial features learned by convolutional neural networks are compatible with audio features at high-level and the heterogeneous multi-modal features can be learned automatically. Accordingly, we propose a correlated neural networks to fuse the face and audio modalities at different level such that the speaker identity can be well identified. The experimental results have shown that our proposed multi-modal speaker recognition approach can produce better performance than single modality, and the feature-level fusion yields comparative and even better results than the decision-level case.

DOI10.1109/WIW.2016.47
URLView source
Indexed ByCPCI-S
Language英语English
WOS Research AreaComputer Science
WOS SubjectComputer Science ; Artificial Intelligence ; Computer Science, Information Systems
WOS IDWOS:000404435600031
Scopus ID2-s2.0-85013648143
Citation statistics
Cited Times:7[WOS]   [WOS Record]     [Related Records in WOS]
Document TypeConference paper
Identifierhttp://repository.uic.edu.cn/handle/39GCC9TT/6373
CollectionBeijing Normal-Hong Kong Baptist University
Corresponding AuthorLiu, Xin
Affiliation
1.Department of Computer Science and Technology,Huaqiao University,Xiamen,China
2.Department of Computer Science,Hong Kong Baptist University,Hong Kong
3.United International College,BNU - HKBU,Zhuhai,China
Recommended Citation
GB/T 7714
Geng, Jiajia,Liu, Xin,Cheung, Yiu Ming. Audio-visual speaker recognition via multi-modal correlated neural networks[C], 2017: 123-128.
Files in This Item:
There are no files associated with this item.
Related Services
Usage statistics
Google Scholar
Similar articles in Google Scholar
[Geng, Jiajia]'s Articles
[Liu, Xin]'s Articles
[Cheung, Yiu Ming]'s Articles
Baidu academic
Similar articles in Baidu academic
[Geng, Jiajia]'s Articles
[Liu, Xin]'s Articles
[Cheung, Yiu Ming]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Geng, Jiajia]'s Articles
[Liu, Xin]'s Articles
[Cheung, Yiu Ming]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.