中国科学院机构知识库网格系统: 面向自然语言理解的图像语义分析方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

面向自然语言理解的图像语义分析方法研究

文献类型：学位论文


作者	温亚
学位类别	硕士
答辩日期	2017-05-24
授予单位	中国科学院沈阳自动化研究所
授予地点	沈阳
导师	南琳
关键词	图像描述计算机视觉自然语言理解深度学习多模态学习
其他题名	Research on Semantic Analysis Method of Image based on Natural Language Understanding
学位专业	机械制造及其自动化
中文摘要	自动生成图像描述连接了计算机视觉和自然语言处理两个领域，一直以来，都是图像理解、人工智能的长远目标。它不仅需要更深层的理解图像语义，还需要合理的生成自然语言来表达。近些年来，随着计算能力的提升、数据资源的丰富、深度学习的发展，该任务已经取得了巨大的进步，但仍然面临着许多未解决的问题和挑战。本文全面研究了自动生成图像描述的相关问题，首先，说明了视觉和语言两个领域的相关技术，如深度学习、语言理解、多模态学习等。其次，详细的介绍了解决该任务的极具代表性的方法。再者，在基线模型的基础上，我们从两个不同的角度，对模型做了改进：第一，开发了一个深度双向门限循环单元图像描述模型，试图在解码阶段，全面挖掘文本描述更深层次的语义；第二，我们提出了双向引导图像描述生成模型，在图像编码阶段，加入文本信息引导图像过滤。在文本解码阶段，加入图像属性信息引导语言生成，使得模型能够更全面挖掘图像和文本的关键信息，削弱信息转换的不平衡影响。最后，在公共评测集MSCOCO上，评估了改进的模型的性能，本文提出的方法无论使用通用的评价指标BLEU、METEOR等，还是使用其他人工评价指标，都比目前已有的相关工作有着较为显著的提高，有力验证了模型的有效性。
英文摘要	Automatically generate image descriptions that connect both computer vision and natural language processing. It has always been the long-term goal of image understanding and artificial intelligence. It not only requires a deeper understanding of the image semantics, but also need to generate a natural language to express. In recent years, with the improvement of computing power, the richness of data resources, the development of deep learning, the task has made great progress, but there are still many unresolved problems and challenges. This paper comprehensively studies the related problems of automatic generation of image description. Firstly, it explains the related technologies of visual and language fields, such as deep learning, language comprehension and multimodal learning. Secondly, The classic method of solving the task is described in detail. Furthermore, on the basis of the baseline model, we have improved the model from a different perspective. First, a deep bi-directional Gated Recurrent Unit image description model is developed, which tries to dig deeper into the language of the description language in the decoding phase. Second, we propose a bi-directional guided image description generation model, In the image coding phase, the text information is added to guide the image filtering. In the text decoding stage, adding the image attribute information to guide the language generation, so that the model can more fully excavate the key information of image and text, weaken the impact of information conversion imbalance. Finally, in the public evaluation set MSCOCO, the performance of the improved model is evaluated. The method proposed in this paper has a significant improvement with the existing work, whether using the common evaluation index BLEU, METEOR, or using other artificial evaluation indexes. So the test results validate the effectiveness of the model.
语种	中文
产权排序	1
源URL	[http://ir.sia.cn/handle/173321/20526]
专题	沈阳自动化研究所_数字工厂研究室
推荐引用方式 GB/T 7714	温亚. 面向自然语言理解的图像语义分析方法研究[D]. 沈阳. 中国科学院沈阳自动化研究所. 2017.

入库方式： OAI收割

来源：沈阳自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。