中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis

文献类型:期刊论文

作者Song LS(宋林森)2,4; Wu WY(吴文岩)3; Fu CY(傅朝友)2,4; Loy, Chen Change1; He R(赫然)2,4
刊名IEEE Transactions on Circuits and Systems for Video Technology
出版日期2022-09-26
卷号33期号:3页码:1247 - 1261
关键词Talking Face Generation Video Generation GAN Thin-plate Spline
DOI10.1109/TCSVT.2022.3210002
英文摘要

Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping. In this paper, we investigate an audio-driven dubbing method that is more feasible for User Generated Content (UGC) production. There are two unique challenges to design a method for UGC: 1) the appearances of speakers are diverse and arbitrary as the method needs to generalize across users; 2) the available video data of one speaker are very limited. In order to tackle the above challenges, we first introduce a new Style Translation Network to integrate the speaking style of the target and the speaking content of the source via a cross-modal AdaIN module. It enables our model to quickly adapt to a new speaker. Then, we further develop a semi-parametric video renderer, which takes full advantage of the limited training data of the unseen speaker via a video-level retrieve-warp-refine pipeline. Finally, we propose a temporal regularization for the semi-parametric renderer, generating more continuous videos. Extensive experiments show that our method generates videos that accurately preserve various speaking styles, yet with considerably lower amount of training data and training time in comparison to existing methods. Besides, our method achieves a faster testing speed than most recent methods.

URL标识查看原文
语种英语
源URL[http://ir.ia.ac.cn/handle/173211/52261]  
专题自动化研究所_智能感知与计算研究中心
通讯作者He R(赫然)
作者单位1.南洋理工大学
2.中科院自动化所
3.北京商汤科技有限公司
4.中国科学院大学
推荐引用方式
GB/T 7714
Song LS,Wu WY,Fu CY,et al. Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis[J]. IEEE Transactions on Circuits and Systems for Video Technology,2022,33(3):1247 - 1261.
APA Song LS,Wu WY,Fu CY,Loy, Chen Change,&He R.(2022).Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis.IEEE Transactions on Circuits and Systems for Video Technology,33(3),1247 - 1261.
MLA Song LS,et al."Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis".IEEE Transactions on Circuits and Systems for Video Technology 33.3(2022):1247 - 1261.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。