Improving diversity of speech-driven gesture generation with memory networks as dynamic dictionaries
文献类型:期刊论文
作者 | Zhao, Zeyu1,2; Gao, Nan1![]() ![]() ![]() ![]() |
刊名 | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY
![]() |
出版日期 | 2024-04-22 |
页码 | 15 |
关键词 | artificial intelligence gesture |
ISSN号 | 2468-6557 |
DOI | 10.1049/cit2.12321 |
通讯作者 | Zeng, Zhi(zhi.zeng@bupt.edu.cn) |
英文摘要 | Generating co-speech gestures for interactive digital humans remains challenging because of the indeterministic nature of the problem. The authors observe that gestures generated from speech audio or text by existing neural methods often contain less movement shift than expected, which can be viewed as slow or dull. Thus, a new generative model coupled with memory networks as dynamic dictionaries for speech-driven gesture generation with improved diversity is proposed. More specifically, the dictionary network dynamically stores connections between text and pose features in a list of key-value pairs as the memory for the pose generation network to look up; the pose generation network then merges the matching pose features and input audio features for generating the final pose sequences. To make the improvements more accurately measurable, a new objective evaluation metric for gesture diversity that can remove the influence of low-quality motions is also proposed and tested. Quantitative and qualitative experiments demonstrate that the proposed architecture succeeds in generating gestures with improved diversity. |
资助项目 | National Key R&D Program of China[2022YFF0902202] ; National Key R&D Programme of China |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:001206843300001 |
出版者 | WILEY |
资助机构 | National Key R&D Program of China ; National Key R&D Programme of China |
源URL | [http://ir.ia.ac.cn/handle/173211/58249] ![]() |
专题 | 数字内容技术与服务研究中心_新媒体服务与管理技术 |
通讯作者 | Zeng, Zhi |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 3.Beijing Univ Posts & Telecommun, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Zhao, Zeyu,Gao, Nan,Zeng, Zhi,et al. Improving diversity of speech-driven gesture generation with memory networks as dynamic dictionaries[J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY,2024:15. |
APA | Zhao, Zeyu,Gao, Nan,Zeng, Zhi,Zhang, Guixuan,Liu, Jie,&Zhang, Shuwu.(2024).Improving diversity of speech-driven gesture generation with memory networks as dynamic dictionaries.CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY,15. |
MLA | Zhao, Zeyu,et al."Improving diversity of speech-driven gesture generation with memory networks as dynamic dictionaries".CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY (2024):15. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。