中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Transformers in computational visual media: A survey

文献类型:期刊论文

作者Xu,Yifan2,3; Wei,Huapeng1; Lin,Minxuan2,3; Deng,Yingying2,3; Sheng,Kekai4; Zhang,Mengdan4; Tang,Fan1; Dong,Weiming2,3,5; Huang,Feiyue4; Xu,Changsheng2,3,5
刊名Computational Visual Media
出版日期2021-10-27
卷号8期号:1页码:33-62
ISSN号2096-0433
关键词visual transformer computational visual media (CVM) high-level vision low-level vision image generation multi-modal learning
DOI10.1007/s41095-021-0247-3
通讯作者Dong,Weiming(weiming.dong@ia.ac.cn)
英文摘要AbstractTransformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than the RNN sequential structure. Thus, such models can be trained in parallel and can represent global information. This study comprehensively surveys recent visual transformer works. We categorize them according to task scenario: backbone design, high-level vision, low-level vision and generation, and multimodal learning. Their key ideas are also analyzed. Differing from previous surveys, we mainly focus on visual transformer methods in low-level vision and generation. The latest works on backbone design are also reviewed in detail. For ease of understanding, we precisely describe the main contributions of the latest works in the form of tables. As well as giving quantitative comparisons, we also present image results for low-level vision and generation tasks. Computational costs and source code links for various important works are also given in this survey to assist further development.
WOS关键词NETWORKS
资助项目National Key R&D Program of China[2020AAA0106200] ; National Natural Science Foundation of China[61832016] ; National Natural Science Foundation of China[U20B2070]
WOS研究方向Computer Science
语种英语
出版者Tsinghua University Press
WOS记录号BMC:10.1007/S41095-021-0247-3
资助机构National Key R&D Program of China ; National Natural Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/46112]  
专题自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队
通讯作者Dong,Weiming
作者单位1.Jilin University; School of Artificial Intelligence
2.Chinese Academy of Sciences; NLPR, Institute of Automation
3.University of Chinese Academy of Sciences; School of Artificial Intelligence
4.Youtu Lab, Tencent Inc.
5.CASIA-LLVISION Joint Lab
推荐引用方式
GB/T 7714
Xu,Yifan,Wei,Huapeng,Lin,Minxuan,et al. Transformers in computational visual media: A survey[J]. Computational Visual Media,2021,8(1):33-62.
APA Xu,Yifan.,Wei,Huapeng.,Lin,Minxuan.,Deng,Yingying.,Sheng,Kekai.,...&Xu,Changsheng.(2021).Transformers in computational visual media: A survey.Computational Visual Media,8(1),33-62.
MLA Xu,Yifan,et al."Transformers in computational visual media: A survey".Computational Visual Media 8.1(2021):33-62.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。