ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer
文献类型:会议论文
作者 | Beiying Yang2,3![]() ![]() ![]() ![]() |
出版日期 | 2023-07-10 |
会议日期 | July 10 2023 to July 14 2023 |
会议地点 | Brisbane, Australia |
英文摘要 | Transformers have achieved great success in various tasks, especially that introducing pure Transformers into video understanding shows powerful performance. However, video Transformer suffers from the problem of memory explosion: it is difficult to be deployed on hardware due to the intensive computation. To address this issue, we propose ST-shift (spatial-temporal) operation with zero computation and zero parameter. We are only shifting a small portion of the channels along the temporal and spatial dimensions. Based on this operation, we build an attention-free ShiftFormer, where ST-shift blocks substitute the attention layers in video Transformer. ShiftFormer is accurate and efficient: it can reduce 56.34% of memory usage and achieve 3.41× faster training. When both using random
initialization, our model performs even better than Video Swin Transformer for video recognition on Something-Something v2.
|
源URL | [http://ir.ia.ac.cn/handle/173211/57295] ![]() |
专题 | 紫东太初大模型研究中心_大模型计算 |
通讯作者 | Beiying Yang |
作者单位 | 1.Peng Cheng Laboratory 2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 3.School of Artificial Intelligence, University of Chinese Academy of Sciences 4.Wuhan AI Research |
推荐引用方式 GB/T 7714 | Beiying Yang,Guibo Zhu,Guojing Ge,et al. ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer[C]. 见:. Brisbane, Australia. July 10 2023 to July 14 2023. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。