中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

文献类型:期刊论文

作者Liu, Ajian4; Tan, Zichang1,3; Yu, Zitong5; Zhao, Chenxu6; Wan, Jun4,7,8; Liang, Yanyan8; Lei, Zhen4,7,9; Zhang, Du8; Li, Stan Z.8,10; Guo, Guodong2
刊名IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
出版日期2023
卷号18页码:4775-4786
关键词Face anti-spoofing flexible-modal testing vision transformer mutual-attention fusion-attention
ISSN号1556-6013
DOI10.1109/TIFS.2023.3296330
通讯作者Wan, Jun(jun.wan@ia.ac.cn)
英文摘要The availability of handy multi-modal (i.e., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The framework based on multi-modal fusion requires providing modalities consistent with the training input, which seriously limits the deployment scenario. (2) The performance of ConvNet-based model on high fidelity datasets is increasingly limited. In this work, we present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT), for face anti-spoofing to flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data. Specifically, FM-ViT retains a specific branch for each modality to capture different modal information and introduces the Cross-Modal Transformer Block (CMTB), which consists of two cascaded attentions named Multi-headed Mutual-Attention (MMA) and Fusion-Attention (MFA) to guide each modal branch to mine potential features from informative patch tokens, and to learn modality-agnostic liveness features by enriching the modal information of own CLS token, respectively. Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin, and approaches the multi-modal frameworks introduced with smaller FLOPs and model parameters.
WOS关键词PRESENTATION ATTACK DETECTION ; IMAGE
资助项目National Key Research and Development Plan[2021YFF0602103] ; External Cooperation Key Project of Chinese Academy Sciences[173211KYSB20200002] ; Chinese National Natural Science Foundation Project[62276254] ; Science and Technology Development Fund of Macau Project[0123/2022/A3] ; Science and Technology Development Fund of Macau Project[0004/2020/A1] ; Science and Technology Development Fund of Macau Project[0070/2020/AMJ] ; Guangdong Provincial Key Research and Development Program[2019B010148001] ; China Computer Federation (CCF)-Zhipu Artificial Intelligent (AI) Large Model[202219] ; InnoHK Program
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:001045264200006
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构National Key Research and Development Plan ; External Cooperation Key Project of Chinese Academy Sciences ; Chinese National Natural Science Foundation Project ; Science and Technology Development Fund of Macau Project ; Guangdong Provincial Key Research and Development Program ; China Computer Federation (CCF)-Zhipu Artificial Intelligent (AI) Large Model ; InnoHK Program
源URL[http://ir.ia.ac.cn/handle/173211/53915]  
专题多模态人工智能系统全国重点实验室
通讯作者Wan, Jun
作者单位1.Baidu Res, Inst Deep Learning, Beijing 100094, Peoples R China
2.Universal Ubiquitous Co, Hangzhou 311202, Zhejiang, Peoples R China
3.Baidu Res, Natl Engn Lab Deep Learning Technol & Applicat, Beijing 100094, Peoples R China
4.Chinese Acad Sci CASIA, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
5.Great Bay Univ, Sch Comp & Informat Technol, Dongguan 523000, Peoples R China
6.Mininglamp Technol, Mininglamp Acad Sci, Beijing 322006, Peoples R China
7.Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China
8.Macau Univ Sci & Technol, Fac Innovat Engn, Sch Comp Sci & Engn, Macau 999078, Peoples R China
9.Chinese Acad Sci, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China
10.Westlake Univ, Sch Engn, Hangzhou 310024, Peoples R China
推荐引用方式
GB/T 7714
Liu, Ajian,Tan, Zichang,Yu, Zitong,et al. FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,2023,18:4775-4786.
APA Liu, Ajian.,Tan, Zichang.,Yu, Zitong.,Zhao, Chenxu.,Wan, Jun.,...&Guo, Guodong.(2023).FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing.IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,18,4775-4786.
MLA Liu, Ajian,et al."FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing".IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 18(2023):4775-4786.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。