中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer

文献类型:会议论文

作者Beiying Yang2,3; Guibo Zhu2,3,4; Guojing Ge2; Jinzhao Luo2,3; Jinqiao Wang1,2,3,4
出版日期2023-07-10
会议日期July 10 2023 to July 14 2023
会议地点Brisbane, Australia
英文摘要
Transformers have achieved great success in various  tasks, especially that introducing pure Transformers into video understanding shows powerful performance. However, video Transformer suffers from the problem of memory explosion: it is difficult to be deployed on hardware due to the intensive computation. To address this issue, we propose ST-shift (spatial-temporal) operation with zero computation and zero parameter. We are only shifting a small portion of the channels along the temporal and spatial dimensions. Based on this operation, we build an attention-free ShiftFormer, where ST-shift blocks substitute the attention layers in video Transformer. ShiftFormer is accurate and efficient: it can reduce 56.34% of memory usage and achieve 3.41× faster training. When both using random
initialization, our model performs even better than Video Swin Transformer for video recognition on Something-Something v2.
源URL[http://ir.ia.ac.cn/handle/173211/57295]  
专题紫东太初大模型研究中心_大模型计算
通讯作者Beiying Yang
作者单位1.Peng Cheng Laboratory
2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
3.School of Artificial Intelligence, University of Chinese Academy of Sciences
4.Wuhan AI Research
推荐引用方式
GB/T 7714
Beiying Yang,Guibo Zhu,Guojing Ge,et al. ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer[C]. 见:. Brisbane, Australia. July 10 2023 to July 14 2023.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。