中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling

文献类型:期刊论文

作者Quan, Weize1,2,3,4; Deng, Pengfei1,2,3; Wang, Kai3; Yan, Dong-Ming1,2,3
刊名IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
出版日期2024
卷号19页码:235-250
ISSN号1556-6013
关键词CG image forensics transformer token labeling generalization robustness
DOI10.1109/TIFS.2023.3322083
通讯作者Yan, Dong-Ming(yandongming@gmail.com)
英文摘要The advanced graphics rendering techniques and image generation algorithms significantly improve the visual quality of computer-generated (CG) images, and this makes it more challenging to distinguish between CG images and natural images (NIs) for a forensic detector. For the identification of CG images, human beings often need to inspect and evaluate the entire image and its local region as well. In addition, we observe that the distributions of both near and far patch-wise correlation have differences between CG images and NIs. Current mainstream methods adopt the CNN-based architecture with the classical cross entropy loss, however, there are several limitations: 1) the weakness of long-distance relationship modeling of image content due to the local receptive field of CNN; 2) the pixel sensitivity due to the convolutional computation; 3) the insufficient supervision due to the training loss on the whole image. In this paper, we propose a novel vision transformer (ViT)-based network with token labeling for CG image identification. Our network, called CGFormer, consists of patch embedding, feature modeling, and token prediction. We apply patch embedding to sequence the input image and weaken the pixel sensitivity. Stacked multi-head attention-based transformer blocks are utilized to model the patch-wise relationship and introduce a certain level of adaptability. Besides the conventional classification loss on class token of the whole image, we additionally introduce a soft cross entropy loss on patch tokens to comprehensively exploit the supervision information from local patches. Extensive experiments demonstrate that our method achieves the state-of-the-art forensic performance on six publicly available datasets in terms of classification accuracy, generalization, and robustness. Code is available at https://github.com/feipiefei/CGFormer.
WOS关键词NATURAL IMAGES ; GRAPHICS
资助项目National Natural Science Foundation of China
WOS研究方向Computer Science ; Engineering
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:001123966000035
资助机构National Natural Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/54928]  
专题多模态人工智能系统全国重点实验室
通讯作者Yan, Dong-Ming
作者单位1.Chinese Acad Sci, Inst Automat, MAIS, Beijing 100190, Peoples R China
2.Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China
3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
4.Guangdong Prov Key Lab Novel Secur Intelligence Te, Shenzhen 518055, Peoples R China
推荐引用方式
GB/T 7714
Quan, Weize,Deng, Pengfei,Wang, Kai,et al. CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,2024,19:235-250.
APA Quan, Weize,Deng, Pengfei,Wang, Kai,&Yan, Dong-Ming.(2024).CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling.IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,19,235-250.
MLA Quan, Weize,et al."CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling".IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 19(2024):235-250.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。