Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection
文献类型:期刊论文
作者 | Xu, Yuting1,4![]() |
刊名 | INTERNATIONAL JOURNAL OF COMPUTER VISION
![]() |
出版日期 | 2024-06-24 |
页码 | 18 |
关键词 | Forgery detection Thumbnail Spatiotemporal inconsistency Graph reasoning Vision transformer |
ISSN号 | 0920-5691 |
DOI | 10.1007/s11263-024-02054-2 |
通讯作者 | Liang, Jian(liangjian92@gmail.com) ; Zhang, Xiao-Yu(zhangxiaoyu@iie.ac.cn) |
英文摘要 | The deepfake threats to society and cybersecurity have provoked significant public apprehension, driving intensified efforts within the realm of deepfake video detection. Current video-level methods are mostly based on 3D CNNs resulting in high computational demands, although have achieved good performance. This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. This transformation process involves sequentially masking frames at the same positions within each frame. These frames are then resized into sub-frames and reorganized into the predetermined layout, forming thumbnails. TALL is model-agnostic and has remarkable simplicity, necessitating only minimal code modifications. Furthermore, we introduce a graph reasoning block (GRB) and semantic consistency (SC) loss to strengthen TALL, culminating in TALL++. GRB enhances interactions between different semantic regions to capture semantic-level inconsistency clues. The semantic consistency loss imposes consistency constraints on semantic features to improve model generalization ability. Extensive experiments on intra-dataset, cross-dataset, diffusion-generated image detection, and deepfake generation method recognition show that TALL++ achieves results surpassing or comparable to the state-of-the-art methods, demonstrating the effectiveness of our approaches for various deepfake detection problems. The code is available at https://github.com/rainy-xu/TALL4Deepfake. |
WOS关键词 | RECOGNITION |
资助项目 | National Natural Science Foundation of China (NSFC)[62376265] ; National Natural Science Foundation of China (NSFC)[62276256] ; National Natural Science Foundation of China (NSFC)[U21B2045] ; Beijing Nova Program[Z211100002121108] ; Young Elite Scientists Sponsorship Program by CAST |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:001255175900003 |
出版者 | SPRINGER |
资助机构 | National Natural Science Foundation of China (NSFC) ; Beijing Nova Program ; Young Elite Scientists Sponsorship Program by CAST |
源URL | [http://ir.ia.ac.cn/handle/173211/59346] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Liang, Jian; Zhang, Xiao-Yu |
作者单位 | 1.Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China 2.Chinese Acad Sci, Inst Automat, CRIPAC, Beijing, Peoples R China 3.Chinese Acad Sci, Inst Automat, MAIS, Beijing, Peoples R China 4.Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China 5.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 6.Univ Sci & Technol China, Dept Automat, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Xu, Yuting,Liang, Jian,Sheng, Lijun,et al. Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION,2024:18. |
APA | Xu, Yuting,Liang, Jian,Sheng, Lijun,&Zhang, Xiao-Yu.(2024).Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection.INTERNATIONAL JOURNAL OF COMPUTER VISION,18. |
MLA | Xu, Yuting,et al."Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection".INTERNATIONAL JOURNAL OF COMPUTER VISION (2024):18. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。