Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval
文献类型:期刊论文
作者 | Guo, Mao1; Zhou, Chenghu1,2![]() |
刊名 | IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
![]() |
出版日期 | 2019-11-01 |
卷号 | 12期号:11页码:4644-4654 |
关键词 | Convolutional neural network cross-modal image retrieval remote sensing speech |
ISSN号 | 1939-1404 |
DOI | 10.1109/JSTARS.2019.2949220 |
通讯作者 | Guo, Mao(guomaoo@foxmail.com) |
英文摘要 | Remote sensing (RS) images are widely used in civilian and military fields. With the highly increasing image data, it has become a challenging issue to achieve fast and efficient RS image retrieval. However, the existing image retrieval methods, text-based or content-based, are still limited in the applications; for example, text input is inefficient, and the sample image for query is often unavailable. It is known that speech is a natural and convenient way of communication. Therefore, a novel speech-image cross-modal retrieval approach, named deep visual-audio network (DVAN), is presented in this article, which can establish the direct relationship between image and speech from paired image-audio data. The model mainly has three parts: 1) Image feature extraction, which is used to extract effective features of RS images; 2) audio feature learning, which is used to recognizing key information from raw data, and AudioNet, as part of DVAN, is proposed to obtain more distinguishing features; 3) multimodal embedding, which is used to learn the direct correlations of two modalities. Experimental results on RS image audio dataset demonstrate that the proposed method is effective and speech-image retrieval is feasible, and it provides a new way for faster and more convenient RS image retrieval. |
WOS关键词 | NEURAL-NETWORKS ; TEXTURE ; SCALE ; FEATURES ; CODES |
资助项目 | National Key Research and Development Program[2016YFF0103604] |
WOS研究方向 | Engineering ; Physical Geography ; Remote Sensing ; Imaging Science & Photographic Technology |
语种 | 英语 |
WOS记录号 | WOS:000508437700040 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | National Key Research and Development Program |
源URL | [http://ir.igsnrr.ac.cn/handle/311030/131391] ![]() |
专题 | 中国科学院地理科学与资源研究所 |
通讯作者 | Guo, Mao |
作者单位 | 1.Nanjing Univ, Sch Geog & Ocean Sci, Collaborat Innovat Ctr South China Sea Studies, Nanjing 210023, Peoples R China 2.Chinese Acad Sci, Inst Geog Sci & Nat Resource Res, Beijing 100101, Peoples R China 3.Nanjing Univ Aeronaut & Astronaut, Sch Astronaut, Nanjing 210016, Peoples R China |
推荐引用方式 GB/T 7714 | Guo, Mao,Zhou, Chenghu,Liu, Jiahang. Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING,2019,12(11):4644-4654. |
APA | Guo, Mao,Zhou, Chenghu,&Liu, Jiahang.(2019).Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval.IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING,12(11),4644-4654. |
MLA | Guo, Mao,et al."Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval".IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 12.11(2019):4644-4654. |
入库方式: OAI收割
来源:地理科学与资源研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。