中国科学院机构知识库网格系统: A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

文献类型：会议论文


作者	Jun Yu; Rongfeng Su; Lan Wang; Wenpeng Zhou
出版日期	2016
会议名称	ISCSLP2016
会议地点	中国天津
英文摘要	This paper presents a multi-channel/multi-speaker 3D audio-visual corpus for Mandarin continuous speech recognition and other fields, such as speech visualization and speech synthesis. This corpus consists of 24 speakers with about 18k utterances, about 20 hours in total. For each utterance, the audio streams were recorded by two professional microphones in near-field and far-field respectively, while a marker-based 3D facial motion capturing system with six infrared cameras was used to acquire the 3D video streams. Besides, the corresponding 2D video streams were captured by an additional camera as a supplement. A data process is described in this paper for synchronizing audio and video streams, detecting and correcting outliers, and removing head motions during recording. Finally, results about data process are also discussed. As so far, this corpus is the largest 3D audio-visual corpus for Mandarin.
收录类别	EI
语种	英语
源URL	[http://ir.siat.ac.cn:8080/handle/172644/10032]
专题	深圳先进技术研究院_集成所
作者单位	2016
推荐引用方式 GB/T 7714	Jun Yu,Rongfeng Su,Lan Wang,et al. A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin[C]. 见:ISCSLP2016. 中国天津.

入库方式： OAI收割

来源：深圳先进技术研究院

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。