中国科学院机构知识库网格系统: FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

文献类型：期刊论文


作者	Liu, Ajian 4; Tan, Zichang1,3 ; Yu, Zitong 5; Zhao, Chenxu 6; Wan, Jun4,7,8 ; Liang, Yanyan 8; Lei, Zhen4,7,9 ; Zhang, Du 8; Li, Stan Z.8,10 ; Guo, Guodong 2
刊名	IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
出版日期	2023
卷号	18 页码:4775-4786
关键词	Face anti-spoofing flexible-modal testing vision transformer mutual-attention fusion-attention
ISSN号	1556-6013
DOI	10.1109/TIFS.2023.3296330
通讯作者	Wan, Jun(jun.wan@ia.ac.cn)
英文摘要	The availability of handy multi-modal (i.e., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The framework based on multi-modal fusion requires providing modalities consistent with the training input, which seriously limits the deployment scenario. (2) The performance of ConvNet-based model on high fidelity datasets is increasingly limited. In this work, we present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT), for face anti-spoofing to flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data. Specifically, FM-ViT retains a specific branch for each modality to capture different modal information and introduces the Cross-Modal Transformer Block (CMTB), which consists of two cascaded attentions named Multi-headed Mutual-Attention (MMA) and Fusion-Attention (MFA) to guide each modal branch to mine potential features from informative patch tokens, and to learn modality-agnostic liveness features by enriching the modal information of own CLS token, respectively. Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin, and approaches the multi-modal frameworks introduced with smaller FLOPs and model parameters.
WOS关键词	PRESENTATION ATTACK DETECTION ; IMAGE
资助项目	National Key Research and Development Plan[2021YFF0602103] ; External Cooperation Key Project of Chinese Academy Sciences[173211KYSB20200002] ; Chinese National Natural Science Foundation Project[62276254] ; Science and Technology Development Fund of Macau Project[0123/2022/A3] ; Science and Technology Development Fund of Macau Project[0004/2020/A1] ; Science and Technology Development Fund of Macau Project[0070/2020/AMJ] ; Guangdong Provincial Key Research and Development Program[2019B010148001] ; China Computer Federation (CCF)-Zhipu Artificial Intelligent (AI) Large Model[202219] ; InnoHK Program
WOS研究方向	Computer Science ; Engineering
语种	英语
WOS记录号	WOS:001045264200006
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	National Key Research and Development Plan ; External Cooperation Key Project of Chinese Academy Sciences ; Chinese National Natural Science Foundation Project ; Science and Technology Development Fund of Macau Project ; Guangdong Provincial Key Research and Development Program ; China Computer Federation (CCF)-Zhipu Artificial Intelligent (AI) Large Model ; InnoHK Program
源URL	[http://ir.ia.ac.cn/handle/173211/53915]
专题	多模态人工智能系统全国重点实验室
通讯作者	Wan, Jun
作者单位	1.Baidu Res, Inst Deep Learning, Beijing 100094, Peoples R China 2.Universal Ubiquitous Co, Hangzhou 311202, Zhejiang, Peoples R China 3.Baidu Res, Natl Engn Lab Deep Learning Technol & Applicat, Beijing 100094, Peoples R China 4.Chinese Acad Sci CASIA, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China 5.Great Bay Univ, Sch Comp & Informat Technol, Dongguan 523000, Peoples R China 6.Mininglamp Technol, Mininglamp Acad Sci, Beijing 322006, Peoples R China 7.Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China 8.Macau Univ Sci & Technol, Fac Innovat Engn, Sch Comp Sci & Engn, Macau 999078, Peoples R China 9.Chinese Acad Sci, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China 10.Westlake Univ, Sch Engn, Hangzhou 310024, Peoples R China
推荐引用方式 GB/T 7714	Liu, Ajian,Tan, Zichang,Yu, Zitong,et al. FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,2023,18:4775-4786.
APA	Liu, Ajian.,Tan, Zichang.,Yu, Zitong.,Zhao, Chenxu.,Wan, Jun.,...&Guo, Guodong.(2023).FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing.IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,18,4775-4786.
MLA	Liu, Ajian,et al."FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing".IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 18(2023):4775-4786.

入库方式： OAI收割

来源：自动化研究所

下载0

FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

其他版本