中国科学院机构知识库网格系统: Everybody’s Talkin’: Let Me Talk as You Want

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Everybody’s Talkin’: Let Me Talk as You Want

文献类型：期刊论文


作者	Song LS(宋林森)1,2 ; Wu WY(吴文岩)4; Qian C(钱晨)4; He R(赫然)1,2 ; Loy, Chen Change 3
刊名	IEEE Transactions on Information Forensics and Security
出版日期	2022-01-26
卷号	17 期号:1 页码:585 - 598
关键词	Talking face generation Video generation GAN Audio dubbing
ISSN号	1556-6013
DOI	10.1109/TIFS.2022.3146783
英文摘要	We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating one source audio into one random chosen video output within a set of speech videos. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e. , expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio. The geometry and pose parameters of the target human portrait are retained, therefore preserving the context of the original video footage. Finally, we introduce a novel video rendering network and a dynamic programming method to construct a temporally coherent and photo-realistic video. Extensive experiments demonstrate the superiority of our method over existing approaches. Our method is end-to-end learnable and robust to voice variations in the source audio.
URL标识	查看原文
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/52260]
专题	自动化研究所_智能感知与计算研究中心
通讯作者	Loy, Chen Change
作者单位	1.中科院自动化所 2.中国科学院大学 3.南洋理工大学 4.商汤科技邮箱公司
推荐引用方式 GB/T 7714	Song LS,Wu WY,Qian C,et al. Everybody’s Talkin’: Let Me Talk as You Want[J]. IEEE Transactions on Information Forensics and Security,2022,17(1):585 - 598.
APA	宋林森,吴文岩,钱晨,赫然,&Loy, Chen Change.(2022).Everybody’s Talkin’: Let Me Talk as You Want.IEEE Transactions on Information Forensics and Security,17(1),585 - 598.
MLA	宋林森,et al."Everybody’s Talkin’: Let Me Talk as You Want".IEEE Transactions on Information Forensics and Security 17.1(2022):585 - 598.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。