中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

文献类型:会议论文

作者Zhang, Chu Yuan2,3; Yi, Jiangyan3; Tao, Jianhua1,4; Wang, Chenglong3; Yan, Xinrui2,3
出版日期2024-05-25
会议日期2024-07-27
会议地点Taiyuan, Shanxi, China
英文摘要

Recent advancements in neural speech synthesis technologies have brought about widespread applications but have also raised concerns about potential misuse and abuse. Addressing these challenges is crucial, particularly in the realms of forensics and intellectual property protection. While previous research on source attribution of synthesized speech has its limitations, our study aims to fill these gaps by investigating the identification of sources in synthesized speech. We focus on analyzing speech synthesis model fingerprints in generated speech waveforms, emphasizing the roles of the acoustic model and vocoder. Our research, based on the multi-speaker LibriTTS dataset, reveals two key insights: (1) both vocoders and acoustic models leave distinct, model-specific fingerprints on generated waveforms, and (2) vocoder fingerprints, being more dominant, may obscure those from the acoustic model. These findings underscore the presence of model-specific fingerprints in both components, suggesting their potential significance in source identification applications.

语种英语
源URL[http://ir.ia.ac.cn/handle/173211/57607]  
专题多模态人工智能系统全国重点实验室
通讯作者Tao, Jianhua
作者单位1.Department of Automation, Tsinghua University
2.University of Chinese Academy of Sciences
3.Institute of Automation, Chinese Academy of Sciences
4.Beijing National Research Center for Information Science and Technology, Tsinghua University
推荐引用方式
GB/T 7714
Zhang, Chu Yuan,Yi, Jiangyan,Tao, Jianhua,et al. Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms[C]. 见:. Taiyuan, Shanxi, China. 2024-07-27.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。