中国科学院机构知识库网格系统: CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling

CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling

文献类型：期刊论文


作者	Quan, Weize1,2,3,4 ; Deng, Pengfei1,2,3 ; Wang, Kai 3; Yan, Dong-Ming1,2,3
刊名	IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
出版日期	2024
卷号	19 页码:235-250
关键词	CG image forensics transformer token labeling generalization robustness
ISSN号	1556-6013
DOI	10.1109/TIFS.2023.3322083
通讯作者	Yan, Dong-Ming(yandongming@gmail.com)
英文摘要	The advanced graphics rendering techniques and image generation algorithms significantly improve the visual quality of computer-generated (CG) images, and this makes it more challenging to distinguish between CG images and natural images (NIs) for a forensic detector. For the identification of CG images, human beings often need to inspect and evaluate the entire image and its local region as well. In addition, we observe that the distributions of both near and far patch-wise correlation have differences between CG images and NIs. Current mainstream methods adopt the CNN-based architecture with the classical cross entropy loss, however, there are several limitations: 1) the weakness of long-distance relationship modeling of image content due to the local receptive field of CNN; 2) the pixel sensitivity due to the convolutional computation; 3) the insufficient supervision due to the training loss on the whole image. In this paper, we propose a novel vision transformer (ViT)-based network with token labeling for CG image identification. Our network, called CGFormer, consists of patch embedding, feature modeling, and token prediction. We apply patch embedding to sequence the input image and weaken the pixel sensitivity. Stacked multi-head attention-based transformer blocks are utilized to model the patch-wise relationship and introduce a certain level of adaptability. Besides the conventional classification loss on class token of the whole image, we additionally introduce a soft cross entropy loss on patch tokens to comprehensively exploit the supervision information from local patches. Extensive experiments demonstrate that our method achieves the state-of-the-art forensic performance on six publicly available datasets in terms of classification accuracy, generalization, and robustness. Code is available at https://github.com/feipiefei/CGFormer.
WOS关键词	NATURAL IMAGES ; GRAPHICS
资助项目	National Natural Science Foundation of China
WOS研究方向	Computer Science ; Engineering
语种	英语
WOS记录号	WOS:001123966000035
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	National Natural Science Foundation of China
源URL	[http://ir.ia.ac.cn/handle/173211/54928]
专题	多模态人工智能系统全国重点实验室
通讯作者	Yan, Dong-Ming
作者单位	1.Chinese Acad Sci, Inst Automat, MAIS, Beijing 100190, Peoples R China 2.Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China 3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 4.Guangdong Prov Key Lab Novel Secur Intelligence Te, Shenzhen 518055, Peoples R China
推荐引用方式 GB/T 7714	Quan, Weize,Deng, Pengfei,Wang, Kai,et al. CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,2024,19:235-250.
APA	Quan, Weize,Deng, Pengfei,Wang, Kai,&Yan, Dong-Ming.(2024).CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling.IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY,19,235-250.
MLA	Quan, Weize,et al."CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling".IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 19(2024):235-250.

入库方式： OAI收割

来源：自动化研究所

下载0

CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling

其他版本