Comprehensive Relation Modelling for Image Paragraph Generation
文献类型:期刊论文
作者 | Xianglu Zhu1,2; Zhang Zhang2,3![]() |
刊名 | Machine Intelligence Research
![]() |
出版日期 | 2024 |
卷号 | 21期号:2页码:369-382 |
关键词 | Image paragraph generation, visual relationship, scene graph, graph convolutional network (GCN), long short-term memory |
ISSN号 | 2731-538X |
DOI | 10.1007/s11633-022-1408-2 |
英文摘要 | Image paragraph generation aims to generate a long description composed of multiple sentences, which is different from traditional image captioning containing only one sentence. Most of previous methods are dedicated to extracting rich features from image regions, and ignore modelling the visual relationships. In this paper, we propose a novel method to generate a paragraph by modelling visual relationships comprehensively. First, we parse an image into a scene graph, where each node represents a specific object and each edge denotes the relationship between two objects. Second, we enrich the object features by implicitly encoding visual relationships through a graph convolutional network (GCN). We further explore high-order relations between different relation features using another graph convolutional network. In addition, we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space. With these features, we present an attention-based topic generation network to select relevant features and produce a set of topic vectors, which are then utilized to generate multiple sentences. We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation, and our method achieves competitive performance in comparison with other state-of-the-art (SOTA) methods. |
源URL | [http://ir.ia.ac.cn/handle/173211/56044] ![]() |
专题 | 自动化研究所_学术期刊_International Journal of Automation and Computing |
作者单位 | 1.Automation Department, University of Science and Technology of China, Hefei 230027, China 2.Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China 3.University of Chinese Academy of Sciences, Beijing 100864, China |
推荐引用方式 GB/T 7714 | Xianglu Zhu,Zhang Zhang,Wei Wang,et al. Comprehensive Relation Modelling for Image Paragraph Generation[J]. Machine Intelligence Research,2024,21(2):369-382. |
APA | Xianglu Zhu,Zhang Zhang,Wei Wang,&Zilei Wang.(2024).Comprehensive Relation Modelling for Image Paragraph Generation.Machine Intelligence Research,21(2),369-382. |
MLA | Xianglu Zhu,et al."Comprehensive Relation Modelling for Image Paragraph Generation".Machine Intelligence Research 21.2(2024):369-382. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。