中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval

文献类型:期刊论文

作者Cheng, Wenlong3,4; Tang, Wei3,4; Huang, Yan3,4; Luo, Yiwen1; Wang, Liang2,3,4
刊名IEEE Transactions on Multimedia
出版日期2022
页码14
产权排序1
文献子类国际期刊
英文摘要

Speech-image retrieval aims at learning the relevance between image and speech. Prior approaches are mainly based on bi-modal contrastive learning, which can not alleviate the cross-modal heterogeneous issue between visual and acoustic modalities well. To address this issue, we propose a visual-acoustic-semantic embedding (VASE) method. First, we propose a tri-modal ranking loss by taking advantage of semantic information corresponding to the acoustic data, which introduces the auxiliary alignment to enhance the alignment between image and speech. Second, we introduce a cycle-consistency loss based on feature reconstruction. It can further alleviate the heterogeneous issue between different data modalities (e.g., visual-acoustic, visual-textual and acoustic-textual). Extensive experiments have demonstrated the effectiveness of our proposed method. In addition, our VASE model achieves state-of-the-art performance on the speech-image retrieval task on the Flickr8K and Places datasets.

语种英语
源URL[http://ir.ia.ac.cn/handle/173211/48532]  
专题自动化研究所_智能感知与计算研究中心
通讯作者Wang, Liang
作者单位1.西安交通大学,人工智能与机器人研究所
2.中国科学院脑科学与智能技术卓越创新中心
3.中国科学院自动化研究所,智能感知与计算研究中心
4.中国科学院大学
推荐引用方式
GB/T 7714
Cheng, Wenlong,Tang, Wei,Huang, Yan,et al. A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval[J]. IEEE Transactions on Multimedia,2022:14.
APA Cheng, Wenlong,Tang, Wei,Huang, Yan,Luo, Yiwen,&Wang, Liang.(2022).A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval.IEEE Transactions on Multimedia,14.
MLA Cheng, Wenlong,et al."A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval".IEEE Transactions on Multimedia (2022):14.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。