中国科学院机构知识库网格系统: Transformers in computational visual media: A survey

Transformers in computational visual media: A survey

文献类型：期刊论文


作者	Xu,Yifan2,3 ; Wei,Huapeng 1; Lin,Minxuan2,3 ; Deng,Yingying2,3 ; Sheng,Kekai4 ; Zhang,Mengdan4 ; Tang,Fan1 ; Dong,Weiming2,3,5 ; Huang,Feiyue 4; Xu,Changsheng2,3,5
刊名	Computational Visual Media
出版日期	2021-10-27
卷号	8 期号:1 页码:33-62
关键词	visual transformer computational visual media (CVM) high-level vision low-level vision image generation multi-modal learning
ISSN号	2096-0433
DOI	10.1007/s41095-021-0247-3
通讯作者	Dong,Weiming(weiming.dong@ia.ac.cn)
英文摘要	AbstractTransformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than the RNN sequential structure. Thus, such models can be trained in parallel and can represent global information. This study comprehensively surveys recent visual transformer works. We categorize them according to task scenario: backbone design, high-level vision, low-level vision and generation, and multimodal learning. Their key ideas are also analyzed. Differing from previous surveys, we mainly focus on visual transformer methods in low-level vision and generation. The latest works on backbone design are also reviewed in detail. For ease of understanding, we precisely describe the main contributions of the latest works in the form of tables. As well as giving quantitative comparisons, we also present image results for low-level vision and generation tasks. Computational costs and source code links for various important works are also given in this survey to assist further development.
WOS关键词	NETWORKS
资助项目	National Key R&D Program of China[2020AAA0106200] ; National Natural Science Foundation of China[61832016] ; National Natural Science Foundation of China[U20B2070]
WOS研究方向	Computer Science
语种	英语
WOS记录号	BMC:10.1007/S41095-021-0247-3
出版者	Tsinghua University Press
资助机构	National Key R&D Program of China ; National Natural Science Foundation of China
源URL	[http://ir.ia.ac.cn/handle/173211/46112]
专题	自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队
通讯作者	Dong,Weiming
作者单位	1.Jilin University; School of Artificial Intelligence 2.Chinese Academy of Sciences; NLPR, Institute of Automation 3.University of Chinese Academy of Sciences; School of Artificial Intelligence 4.Youtu Lab, Tencent Inc. 5.CASIA-LLVISION Joint Lab
推荐引用方式 GB/T 7714	Xu,Yifan,Wei,Huapeng,Lin,Minxuan,et al. Transformers in computational visual media: A survey[J]. Computational Visual Media,2021,8(1):33-62.
APA	Xu,Yifan.,Wei,Huapeng.,Lin,Minxuan.,Deng,Yingying.,Sheng,Kekai.,...&Xu,Changsheng.(2021).Transformers in computational visual media: A survey.Computational Visual Media,8(1),33-62.
MLA	Xu,Yifan,et al."Transformers in computational visual media: A survey".Computational Visual Media 8.1(2021):33-62.

入库方式： OAI收割

来源：自动化研究所

下载0

Transformers in computational visual media: A survey

其他版本