FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing
文献类型:期刊论文
作者 | Liu, Ajian4; Tan, Zichang1,3![]() ![]() ![]() ![]() |
刊名 | IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
![]() |
出版日期 | 2023 |
卷号 | 18页码:4775-4786 |
关键词 | Face anti-spoofing flexible-modal testing vision transformer mutual-attention fusion-attention |
ISSN号 | 1556-6013 |
DOI | 10.1109/TIFS.2023.3296330 |
通讯作者 | Wan, Jun(jun.wan@ia.ac.cn) |
英文摘要 | The availability of handy multi-modal (i.e., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The framework based on multi-modal fusion requires providing modalities consistent with the training input, which seriously limits the deployment scenario. (2) The performance of ConvNet-based model on high fidelity datasets is increasingly limited. In this work, we present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT), for face anti-spoofing to flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data. Specifically, FM-ViT retains a specific branch for each modality to capture different modal information and introduces the Cross-Modal Transformer Block (CMTB), which consists of two cascaded attentions named Multi-headed Mutual-Attention (MMA) and Fusion-Attention (MFA) to guide each modal branch to mine potential features from informative patch tokens, and to learn modality-agnostic liveness features by enriching the modal information of own CLS token, respectively. Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin, and approaches the multi-modal frameworks introduced with smaller FLOPs and model parameters. |
WOS关键词 | PRESENTATION ATTACK DETECTION ; IMAGE |
资助项目 | National Key Research and Development Plan[2021YFF0602103] ; External Cooperation Key Project of Chinese Academy Sciences[173211KYSB20200002] ; Chinese National Natural Science Foundation Project[62276254] ; Science and Technology Development Fund of Macau Project[0123/2022/A3] ; Science and Technology Development Fund of Macau Project[0004/2020/A1] ; Science and Technology Development Fund of Macau Project[0070/2020/AMJ] ; Guangdong Provincial Key Research and Development Program[2019B010148001] ; China Computer Federation (CCF)-Zhipu Artificial Intelligent (AI) Large Model[202219] ; InnoHK Program |
WOS研究方向 | Computer Science ; Engineering |
语种 | 英语 |
WOS记录号 | WOS:001045264200006 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | National Key Research and Development Plan ; External Cooperation Key Project of Chinese Academy Sciences ; Chinese National Natural Science Foundation Project ; Science and Technology Development Fund of Macau Project ; Guangdong Provincial Key Research and Development Program ; China Computer Federation (CCF)-Zhipu Artificial Intelligent (AI) Large Model ; InnoHK Program |
源URL | [http://ir.ia.ac.cn/handle/173211/53915] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Wan, Jun |
作者单位 | 1.Baidu Res, Inst Deep Learning, Beijing 100094, Peoples R China 2.Universal Ubiquitous Co, Hangzhou 311202, Zhejiang, Peoples R China 3.Baidu Res, Natl Engn Lab Deep Learning Technol & Applicat, Beijing 100094, Peoples R China 4.Chinese Acad Sci CASIA, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China 5.Great Bay Univ, Sch Comp & Informat Technol, Dongguan 523000, Peoples R China 6.Mininglamp Technol, Mininglamp Acad Sci, Beijing 322006, Peoples R China 7.Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China 8.Macau Univ Sci & Technol, Fac Innovat Engn, Sch Comp Sci & Engn, Macau 999078, Peoples R China 9.Chinese Acad Sci, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China 10.Westlake Univ, Sch Engn, Hangzhou 310024, Peoples R China |
推荐引用方式 GB/T 7714 | Liu, Ajian,Tan, Zichang,Yu, Zitong,et al. FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,2023,18:4775-4786. |
APA | Liu, Ajian.,Tan, Zichang.,Yu, Zitong.,Zhao, Chenxu.,Wan, Jun.,...&Guo, Guodong.(2023).FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing.IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,18,4775-4786. |
MLA | Liu, Ajian,et al."FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing".IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 18(2023):4775-4786. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。