中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings

文献类型:期刊论文

作者Yi, Jiangyan2; Tao, Jianhua1,4; Fu, Ruibo2; Wang, Tao2; Zhang, Chu Yuan2; Wang, Chenglong3
刊名IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
出版日期2023
卷号31页码:2963-2973
关键词Adversarial training multi-task learning prosodic boundaries speech synthesis multi-modal embeddings
ISSN号2329-9290
DOI10.1109/TASLP.2023.3301235
通讯作者Yi, Jiangyan(jiangyan.yi@nlpr.ia.ac.cn) ; Tao, Jianhua(jhtao@tsinghua.edu.cn)
英文摘要boundaries are still crucial to the natural-ness of end-to-end speech synthesis systems. This article proposes to use adversarial multi-task learning to predict prosodic boundaries. Adversarial multi-task learning is utilized to transfer knowledge from an auxiliary POS tagging task to a prosodic boundary pre-diction task. Furthermore, multi-modal embeddings are composed of contextual word and speech embedding features obtained from the pre-trained bidirectional encoder representations from trans-formers (BERT) model and Speech2Vec. We can utilize linguistic and acoustic information from large amounts of external text and speech data without prosodic boundary labels. At the inference stage, the prosodic boundary predicting model can use the syntactic features learnt from the POS tagging task without any extra compu-tation cost due to only employing the prosodic boundary predicting task to decode. We conducted experiments on Mandarin datasets. The results show that the models using multi-modal embeddings from the pre-trained BERT and Speech2Vec outperform the mod-els trained with single modal embedding. Furthermore, the mod-els trained with adversarial training obtain further performance gains by up to 2.95% in F-1 score.
WOS关键词SPEECH SYNTHESIS ; SEQUENCE ; MODEL
资助项目National Natural Science Foundation of China (NSFC)[61831022] ; National Natural Science Foundation of China (NSFC)[U21B2010] ; National Natural Science Foundation of China (NSFC)[62101553] ; National Natural Science Foundation of China (NSFC)[61971419] ; National Natural Science Foundation of China (NSFC)[62006223] ; National Natural Science Foundation of China (NSFC)[62276259] ; National Natural Science Foundation of China (NSFC)[62201572] ; National Natural Science Foundation of China (NSFC)[62206278] ; Beijing Municipal Science and Technology Commission, Administrative Commission of Zhongguancun Science[Z211100004821013]
WOS研究方向Acoustics ; Engineering
语种英语
WOS记录号WOS:001045259400002
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构National Natural Science Foundation of China (NSFC) ; Beijing Municipal Science and Technology Commission, Administrative Commission of Zhongguancun Science
源URL[http://ir.ia.ac.cn/handle/173211/53906]  
专题多模态人工智能系统全国重点实验室
通讯作者Yi, Jiangyan; Tao, Jianhua
作者单位1.Tsinghua Univ, Dept Automat, Beijing 100190, Peoples R China
2.Chinese Acad Sci, Univ Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Arcial Intelligence Syst, Beijing 101408, Peoples R China
3.Univ Chinese Acad Sci, Beijing 101408, Peoples R China
4.Univ Sci & Technol China, Sch Artificial Intelligence, Hefei 230026, Peoples R China
推荐引用方式
GB/T 7714
Yi, Jiangyan,Tao, Jianhua,Fu, Ruibo,et al. Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2023,31:2963-2973.
APA Yi, Jiangyan,Tao, Jianhua,Fu, Ruibo,Wang, Tao,Zhang, Chu Yuan,&Wang, Chenglong.(2023).Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,31,2963-2973.
MLA Yi, Jiangyan,et al."Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 31(2023):2963-2973.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。