Multimodal Image Aesthetic Prediction with Missing Modality
文献类型:期刊论文
作者 | Zhang, Xiaodan2; Song, Qiao2; Liu, Gang1![]() |
刊名 | MATHEMATICS
![]() |
出版日期 | 2022-07 |
卷号 | 10期号:13 |
关键词 | image aesthetic quality assessment multimodal learning missing multimodal data transformer |
ISSN号 | 2227-7390 |
DOI | 10.3390/math10132312 |
产权排序 | 2 |
英文摘要 | With the increasing growth of multimedia data on the Internet, multimodal image aesthetic assessment has attracted a great deal of attention in the image processing community. However, traditional multimodal methods often have the following two problems: (1) Existing multimodal image aesthetic methods are based on the assumption that full modalities are available in all samples, which is unapplicable in most cases since textual information is more difficult to obtain. (2) They only fuse multimodal information at a single level and ignore their interaction at different levels. To address these two challenges, we proposed a novel framework termed Missing-Modility-Multimodal-Bert networks (MMMB). To achieve the completeness, we first generate the missing textual modality conditioned on the available visual modality. We then project the image features to the token space of the text, and use the transformer's self-attention mechanism to make the two different modalities information interact at different levels for earlier and more fine-grained fusion, rather than only at the final layer. A large number of experiments on two large benchmark datasets in the field of image aesthetic quality evaluation: AVA and Photo.net demonstrate that the proposed model significantly improves image aesthetic assessment performance under both textual missing modality condition and full-modality condition. |
语种 | 英语 |
WOS记录号 | WOS:000823883500001 |
出版者 | MDPI |
源URL | [http://ir.opt.ac.cn/handle/181661/96060] ![]() |
专题 | 西安光学精密机械研究所_空间光学应用研究室 |
通讯作者 | Liu, Gang |
作者单位 | 1.Chinese Acad Sci, Xian Inst Opt & Precis Mech, Xian 710119, Peoples R China 2.Northwest Univ, Sci & Technol Informat Inst, Xian 710127, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Xiaodan,Song, Qiao,Liu, Gang. Multimodal Image Aesthetic Prediction with Missing Modality[J]. MATHEMATICS,2022,10(13). |
APA | Zhang, Xiaodan,Song, Qiao,&Liu, Gang.(2022).Multimodal Image Aesthetic Prediction with Missing Modality.MATHEMATICS,10(13). |
MLA | Zhang, Xiaodan,et al."Multimodal Image Aesthetic Prediction with Missing Modality".MATHEMATICS 10.13(2022). |
入库方式: OAI收割
来源:西安光学精密机械研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。