Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
文献类型:会议论文
作者 | Zhang, Chu Yuan2,3![]() ![]() ![]() |
出版日期 | 2024-05-25 |
会议日期 | 2024-07-27 |
会议地点 | Taiyuan, Shanxi, China |
英文摘要 | Recent advancements in neural speech synthesis technologies have brought about widespread applications but have also raised concerns about potential misuse and abuse. Addressing these challenges is crucial, particularly in the realms of forensics and intellectual property protection. While previous research on source attribution of synthesized speech has its limitations, our study aims to fill these gaps by investigating the identification of sources in synthesized speech. We focus on analyzing speech synthesis model fingerprints in generated speech waveforms, emphasizing the roles of the acoustic model and vocoder. Our research, based on the multi-speaker LibriTTS dataset, reveals two key insights: (1) both vocoders and acoustic models leave distinct, model-specific fingerprints on generated waveforms, and (2) vocoder fingerprints, being more dominant, may obscure those from the acoustic model. These findings underscore the presence of model-specific fingerprints in both components, suggesting their potential significance in source identification applications. |
语种 | 英语 |
源URL | [http://ir.ia.ac.cn/handle/173211/57607] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Tao, Jianhua |
作者单位 | 1.Department of Automation, Tsinghua University 2.University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences 4.Beijing National Research Center for Information Science and Technology, Tsinghua University |
推荐引用方式 GB/T 7714 | Zhang, Chu Yuan,Yi, Jiangyan,Tao, Jianhua,et al. Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms[C]. 见:. Taiyuan, Shanxi, China. 2024-07-27. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。