中国科学院机构知识库网格系统: Adaptive Search for Broad Attention based Vision Transformers

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Adaptive Search for Broad Attention based Vision Transformers

文献类型：期刊论文


作者	Nannan Li1,2 ; Yaran Chen1,2 ; Dongbin Zhao1,2
刊名	IEEE Transactions on Evolutionary Computation
出版日期	2023
页码	0-0
英文摘要	In recent years, Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation. Frustratingly, the manual design of efficient architectures for ViTs can be time-consuming, often requiring repetitive trial and error. Moreover, existing lightweight ViTs have not been thoroughly explored, leading to weaker performance compared to convolutional neural networks. To address these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which incorporates broad attention and adaptive neural architecture evolution to strengthen light-weight ViTs. The inclusion of broad attention within the search space allows us to explore novel architectures that can significantly enhance the performance of light-weight ViTs by providing more comprehensive attention information. We also design an efficient adaptive evolutionary algorithm to explore effective architectures by dynamically adjusting the probability distribution of candidate mutation operators. Our experimental results show that the adaptive evolution in ASB can efficiently learn excellent light-weight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is demonstrated in several visual tasks, including image classification, mobile COCO panoptic segmentation, and mobile ADE20K semantic segmentation. For instance, on ImageNet, searched model achieves 77.8% performance with 6.5M parameters, resulting in a 0.7% accuracy improvement over the state-of-the-art EfficientNet-B0. On mobile COCO panoptic segmentation, our approach outperforms prevalent MobileNetV2 by 7.4% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU, which exceeds MobileNetV2 with 6.9% mIoU.
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/52214]
专题	复杂系统管理与控制国家重点实验室_深度强化学习
作者单位	1.School of artificial intelligence, University of Chinese Academy of Sciences 2.The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Nannan Li,Yaran Chen,Dongbin Zhao. Adaptive Search for Broad Attention based Vision Transformers[J]. IEEE Transactions on Evolutionary Computation,2023:0-0.
APA	Nannan Li,Yaran Chen,&Dongbin Zhao.(2023).Adaptive Search for Broad Attention based Vision Transformers.IEEE Transactions on Evolutionary Computation,0-0.
MLA	Nannan Li,et al."Adaptive Search for Broad Attention based Vision Transformers".IEEE Transactions on Evolutionary Computation (2023):0-0.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。