Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification
文献类型:期刊论文
作者 | Ji, Ruyi4,5; Li, Jiaying3; Zhang, Libo5; Liu, Jing1,2; Wu, Yanjun5 |
刊名 | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY |
出版日期 | 2023-09-01 |
卷号 | 33期号:9页码:5009-5021 |
ISSN号 | 1051-8215 |
关键词 | Transformer multi-grained assembly fine-grained visual classification |
DOI | 10.1109/TCSVT.2023.3248791 |
通讯作者 | Zhang, Libo(libo@iscas.ac.cn) |
英文摘要 | Fine-grained visual classification requires distinguishing sub-categories within the same super-category, which suffers from small inter-class and large intra-class variances. This paper aims to improve the FGVC task towards better performance, for which we deliver a novel dual Transformer framework (coined Dual-TR) with multi-grained assembly. The Dual-TR is well-designed to encode fine-grained objects by two parallel hierarchies, which is amenable to capturing the subtle yet discriminative cues via the self-attention mechanism in ViT. Specifically, we perform orthogonal multi-grained assembly within the Transformer structure for a more robust representation, i.e., intra-layer and inter-layer assembly. The former aims to explore the informative feature in various self-attention heads within the Transformer layer. The latter pays attention to the token assembly across Transformer layers. Meanwhile, we introduce the constraint of center loss to pull intra-class samples' compactness and push that of inter-class samples. Extensive experiments show that Dual-TR performs on par with the state-of-the-art methods on four public benchmarks, including CUB-200-2011, NABirds, iNaturalist2017, and Stanford Dogs. The comprehensive ablation studies further demonstrate the effectiveness of architectural design choices. |
资助项目 | Key Research Program of Frontier Sciences, CAS[ZDBSLY-JSC038] ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS[2020111] |
WOS研究方向 | Engineering |
语种 | 英语 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
WOS记录号 | WOS:001063316800042 |
资助机构 | Key Research Program of Frontier Sciences, CAS ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS |
源URL | [http://ir.ia.ac.cn/handle/173211/53131] |
专题 | 紫东太初大模型研究中心 |
通讯作者 | Zhang, Libo |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101400, Peoples R China 3.Beijing Informat Sci & Technol Univ, Sch Comp Sci, Beijing 100192, Peoples R China 4.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101400, Peoples R China 5.Chinese Acad Sci, State Key Lab Comp Sci, Inst Software, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Ji, Ruyi,Li, Jiaying,Zhang, Libo,et al. Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2023,33(9):5009-5021. |
APA | Ji, Ruyi,Li, Jiaying,Zhang, Libo,Liu, Jing,&Wu, Yanjun.(2023).Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,33(9),5009-5021. |
MLA | Ji, Ruyi,et al."Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 33.9(2023):5009-5021. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。