Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
文献类型:期刊论文
作者 | Yi, Jiangyan2![]() ![]() ![]() ![]() |
刊名 | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
![]() |
出版日期 | 2023 |
卷号 | 31页码:2963-2973 |
关键词 | Adversarial training multi-task learning prosodic boundaries speech synthesis multi-modal embeddings |
ISSN号 | 2329-9290 |
DOI | 10.1109/TASLP.2023.3301235 |
通讯作者 | Yi, Jiangyan(jiangyan.yi@nlpr.ia.ac.cn) ; Tao, Jianhua(jhtao@tsinghua.edu.cn) |
英文摘要 | boundaries are still crucial to the natural-ness of end-to-end speech synthesis systems. This article proposes to use adversarial multi-task learning to predict prosodic boundaries. Adversarial multi-task learning is utilized to transfer knowledge from an auxiliary POS tagging task to a prosodic boundary pre-diction task. Furthermore, multi-modal embeddings are composed of contextual word and speech embedding features obtained from the pre-trained bidirectional encoder representations from trans-formers (BERT) model and Speech2Vec. We can utilize linguistic and acoustic information from large amounts of external text and speech data without prosodic boundary labels. At the inference stage, the prosodic boundary predicting model can use the syntactic features learnt from the POS tagging task without any extra compu-tation cost due to only employing the prosodic boundary predicting task to decode. We conducted experiments on Mandarin datasets. The results show that the models using multi-modal embeddings from the pre-trained BERT and Speech2Vec outperform the mod-els trained with single modal embedding. Furthermore, the mod-els trained with adversarial training obtain further performance gains by up to 2.95% in F-1 score. |
WOS关键词 | SPEECH SYNTHESIS ; SEQUENCE ; MODEL |
资助项目 | National Natural Science Foundation of China (NSFC)[61831022] ; National Natural Science Foundation of China (NSFC)[U21B2010] ; National Natural Science Foundation of China (NSFC)[62101553] ; National Natural Science Foundation of China (NSFC)[61971419] ; National Natural Science Foundation of China (NSFC)[62006223] ; National Natural Science Foundation of China (NSFC)[62276259] ; National Natural Science Foundation of China (NSFC)[62201572] ; National Natural Science Foundation of China (NSFC)[62206278] ; Beijing Municipal Science and Technology Commission, Administrative Commission of Zhongguancun Science[Z211100004821013] |
WOS研究方向 | Acoustics ; Engineering |
语种 | 英语 |
WOS记录号 | WOS:001045259400002 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | National Natural Science Foundation of China (NSFC) ; Beijing Municipal Science and Technology Commission, Administrative Commission of Zhongguancun Science |
源URL | [http://ir.ia.ac.cn/handle/173211/53906] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Yi, Jiangyan; Tao, Jianhua |
作者单位 | 1.Tsinghua Univ, Dept Automat, Beijing 100190, Peoples R China 2.Chinese Acad Sci, Univ Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Arcial Intelligence Syst, Beijing 101408, Peoples R China 3.Univ Chinese Acad Sci, Beijing 101408, Peoples R China 4.Univ Sci & Technol China, Sch Artificial Intelligence, Hefei 230026, Peoples R China |
推荐引用方式 GB/T 7714 | Yi, Jiangyan,Tao, Jianhua,Fu, Ruibo,et al. Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2023,31:2963-2973. |
APA | Yi, Jiangyan,Tao, Jianhua,Fu, Ruibo,Wang, Tao,Zhang, Chu Yuan,&Wang, Chenglong.(2023).Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,31,2963-2973. |
MLA | Yi, Jiangyan,et al."Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 31(2023):2963-2973. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。