中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval

文献类型:期刊论文

作者Guo, Mao1; Zhou, Chenghu1,2; Liu, Jiahang3
刊名IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
出版日期2019-11-01
卷号12期号:11页码:4644-4654
关键词Convolutional neural network cross-modal image retrieval remote sensing speech
ISSN号1939-1404
DOI10.1109/JSTARS.2019.2949220
通讯作者Guo, Mao(guomaoo@foxmail.com)
英文摘要Remote sensing (RS) images are widely used in civilian and military fields. With the highly increasing image data, it has become a challenging issue to achieve fast and efficient RS image retrieval. However, the existing image retrieval methods, text-based or content-based, are still limited in the applications; for example, text input is inefficient, and the sample image for query is often unavailable. It is known that speech is a natural and convenient way of communication. Therefore, a novel speech-image cross-modal retrieval approach, named deep visual-audio network (DVAN), is presented in this article, which can establish the direct relationship between image and speech from paired image-audio data. The model mainly has three parts: 1) Image feature extraction, which is used to extract effective features of RS images; 2) audio feature learning, which is used to recognizing key information from raw data, and AudioNet, as part of DVAN, is proposed to obtain more distinguishing features; 3) multimodal embedding, which is used to learn the direct correlations of two modalities. Experimental results on RS image audio dataset demonstrate that the proposed method is effective and speech-image retrieval is feasible, and it provides a new way for faster and more convenient RS image retrieval.
WOS关键词NEURAL-NETWORKS ; TEXTURE ; SCALE ; FEATURES ; CODES
资助项目National Key Research and Development Program[2016YFF0103604]
WOS研究方向Engineering ; Physical Geography ; Remote Sensing ; Imaging Science & Photographic Technology
语种英语
WOS记录号WOS:000508437700040
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构National Key Research and Development Program
源URL[http://ir.igsnrr.ac.cn/handle/311030/131391]  
专题中国科学院地理科学与资源研究所
通讯作者Guo, Mao
作者单位1.Nanjing Univ, Sch Geog & Ocean Sci, Collaborat Innovat Ctr South China Sea Studies, Nanjing 210023, Peoples R China
2.Chinese Acad Sci, Inst Geog Sci & Nat Resource Res, Beijing 100101, Peoples R China
3.Nanjing Univ Aeronaut & Astronaut, Sch Astronaut, Nanjing 210016, Peoples R China
推荐引用方式
GB/T 7714
Guo, Mao,Zhou, Chenghu,Liu, Jiahang. Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING,2019,12(11):4644-4654.
APA Guo, Mao,Zhou, Chenghu,&Liu, Jiahang.(2019).Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval.IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING,12(11),4644-4654.
MLA Guo, Mao,et al."Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval".IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 12.11(2019):4644-4654.

入库方式: OAI收割

来源:地理科学与资源研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。