Transformers in computational visual media: A survey
文献类型:期刊论文
作者 | Xu,Yifan2,3; Wei,Huapeng1; Lin,Minxuan2,3; Deng,Yingying2,3; Sheng,Kekai4; Zhang,Mengdan4; Tang,Fan1; Dong,Weiming2,3,5; Huang,Feiyue4; Xu,Changsheng2,3,5 |
刊名 | Computational Visual Media |
出版日期 | 2021-10-27 |
卷号 | 8期号:1页码:33-62 |
ISSN号 | 2096-0433 |
关键词 | visual transformer computational visual media (CVM) high-level vision low-level vision image generation multi-modal learning |
DOI | 10.1007/s41095-021-0247-3 |
通讯作者 | Dong,Weiming(weiming.dong@ia.ac.cn) |
英文摘要 | AbstractTransformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than the RNN sequential structure. Thus, such models can be trained in parallel and can represent global information. This study comprehensively surveys recent visual transformer works. We categorize them according to task scenario: backbone design, high-level vision, low-level vision and generation, and multimodal learning. Their key ideas are also analyzed. Differing from previous surveys, we mainly focus on visual transformer methods in low-level vision and generation. The latest works on backbone design are also reviewed in detail. For ease of understanding, we precisely describe the main contributions of the latest works in the form of tables. As well as giving quantitative comparisons, we also present image results for low-level vision and generation tasks. Computational costs and source code links for various important works are also given in this survey to assist further development. |
WOS关键词 | NETWORKS |
资助项目 | National Key R&D Program of China[2020AAA0106200] ; National Natural Science Foundation of China[61832016] ; National Natural Science Foundation of China[U20B2070] |
WOS研究方向 | Computer Science |
语种 | 英语 |
出版者 | Tsinghua University Press |
WOS记录号 | BMC:10.1007/S41095-021-0247-3 |
资助机构 | National Key R&D Program of China ; National Natural Science Foundation of China |
源URL | [http://ir.ia.ac.cn/handle/173211/46112] |
专题 | 自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队 |
通讯作者 | Dong,Weiming |
作者单位 | 1.Jilin University; School of Artificial Intelligence 2.Chinese Academy of Sciences; NLPR, Institute of Automation 3.University of Chinese Academy of Sciences; School of Artificial Intelligence 4.Youtu Lab, Tencent Inc. 5.CASIA-LLVISION Joint Lab |
推荐引用方式 GB/T 7714 | Xu,Yifan,Wei,Huapeng,Lin,Minxuan,et al. Transformers in computational visual media: A survey[J]. Computational Visual Media,2021,8(1):33-62. |
APA | Xu,Yifan.,Wei,Huapeng.,Lin,Minxuan.,Deng,Yingying.,Sheng,Kekai.,...&Xu,Changsheng.(2021).Transformers in computational visual media: A survey.Computational Visual Media,8(1),33-62. |
MLA | Xu,Yifan,et al."Transformers in computational visual media: A survey".Computational Visual Media 8.1(2021):33-62. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。