中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification

文献类型:期刊论文

作者Ji, Ruyi4,5; Li, Jiaying3; Zhang, Libo5; Liu, Jing1,2; Wu, Yanjun5
刊名IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
出版日期2023-09-01
卷号33期号:9页码:5009-5021
ISSN号1051-8215
关键词Transformer multi-grained assembly fine-grained visual classification
DOI10.1109/TCSVT.2023.3248791
通讯作者Zhang, Libo(libo@iscas.ac.cn)
英文摘要Fine-grained visual classification requires distinguishing sub-categories within the same super-category, which suffers from small inter-class and large intra-class variances. This paper aims to improve the FGVC task towards better performance, for which we deliver a novel dual Transformer framework (coined Dual-TR) with multi-grained assembly. The Dual-TR is well-designed to encode fine-grained objects by two parallel hierarchies, which is amenable to capturing the subtle yet discriminative cues via the self-attention mechanism in ViT. Specifically, we perform orthogonal multi-grained assembly within the Transformer structure for a more robust representation, i.e., intra-layer and inter-layer assembly. The former aims to explore the informative feature in various self-attention heads within the Transformer layer. The latter pays attention to the token assembly across Transformer layers. Meanwhile, we introduce the constraint of center loss to pull intra-class samples' compactness and push that of inter-class samples. Extensive experiments show that Dual-TR performs on par with the state-of-the-art methods on four public benchmarks, including CUB-200-2011, NABirds, iNaturalist2017, and Stanford Dogs. The comprehensive ablation studies further demonstrate the effectiveness of architectural design choices.
资助项目Key Research Program of Frontier Sciences, CAS[ZDBSLY-JSC038] ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS[2020111]
WOS研究方向Engineering
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:001063316800042
资助机构Key Research Program of Frontier Sciences, CAS ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS
源URL[http://ir.ia.ac.cn/handle/173211/53131]  
专题紫东太初大模型研究中心
通讯作者Zhang, Libo
作者单位1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101400, Peoples R China
3.Beijing Informat Sci & Technol Univ, Sch Comp Sci, Beijing 100192, Peoples R China
4.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101400, Peoples R China
5.Chinese Acad Sci, State Key Lab Comp Sci, Inst Software, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Ji, Ruyi,Li, Jiaying,Zhang, Libo,et al. Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2023,33(9):5009-5021.
APA Ji, Ruyi,Li, Jiaying,Zhang, Libo,Liu, Jing,&Wu, Yanjun.(2023).Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,33(9),5009-5021.
MLA Ji, Ruyi,et al."Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 33.9(2023):5009-5021.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。