中国科学院机构知识库网格系统: ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer

文献类型：会议论文


作者	Beiying Yang2,3 ; Guibo Zhu2,3,4 ; Guojing Ge 2; Jinzhao Luo2,3 ; Jinqiao Wang1,2,3,4
出版日期	2023-07-10
会议日期	July 10 2023 to July 14 2023
会议地点	Brisbane, Australia
英文摘要	Transformers have achieved great success in various tasks, especially that introducing pure Transformers into video understanding shows powerful performance. However, video Transformer suffers from the problem of memory explosion: it is difficult to be deployed on hardware due to the intensive computation. To address this issue, we propose ST-shift (spatial-temporal) operation with zero computation and zero parameter. We are only shifting a small portion of the channels along the temporal and spatial dimensions. Based on this operation, we build an attention-free ShiftFormer, where ST-shift blocks substitute the attention layers in video Transformer. ShiftFormer is accurate and efficient: it can reduce 56.34% of memory usage and achieve 3.41× faster training. When both using random initialization, our model performs even better than Video Swin Transformer for video recognition on Something-Something v2.
源URL	[http://ir.ia.ac.cn/handle/173211/57295]
专题	紫东太初大模型研究中心_大模型计算
通讯作者	Beiying Yang
作者单位	1.Peng Cheng Laboratory 2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 3.School of Artificial Intelligence, University of Chinese Academy of Sciences 4.Wuhan AI Research
推荐引用方式 GB/T 7714	Beiying Yang,Guibo Zhu,Guojing Ge,et al. ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer[C]. 见:. Brisbane, Australia. July 10 2023 to July 14 2023.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。