中国科学院机构知识库网格系统: TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training

文献类型：会议论文


作者	Yulong Liu 1,2; Guibo Zhu2,3,4 ; Bin Zhu 5; Qi Song5 ; Guojing Ge 2; Haoran Chen 2,4; Guanhui Qiao 2,4; Ru Peng 1; Lingxiang Wu2 ; Jinqiao Wang2,3,4
出版日期	2022-11-28
会议日期	2022-11-28至 2022-12-9
会议地点	New Orleans Convention Center ，America
英文摘要	Vision-Language Pre-training (VLP) has been shown to be an efficient method to improve the performance of models on different vision-and-language downstream tasks. Substantial studies have shown that neural networks may be able to learn some general rules about language and visual concepts from a large-scale weakly labeled image-text dataset. However, most of the public cross-modal datasets that contain more than 100M image-text pairs are in English; there is a lack of available large-scale and high-quality Chinese VLP datasets. In this work, we propose a new framework for automatic dataset acquisition and cleaning with which we construct a new large-scale and high-quality cross-modal dataset named as TaiSu, containing 166 million images and 219 million Chinese captions. Compared with the recently released Wukong dataset, our dataset is achieved with much stricter restrictions on the semantic correlation of image-text pairs. We also propose to combine texts collected from the web with texts generated by a pre-trained image captioning model. To the best of our knowledge, TaiSu is currently the largest publicly accessible Chinese cross-modal dataset. Furthermore, we test our dataset on several vision-language downstream tasks. TaiSu outperforms BriVL by a large margin on the zero-shot image-text retrieval task and zero-shot image classification task. TaiSu also shows better performance than Wukong on the image-retrieval task without using image augmentation for training. Results demonstrate that TaiSu can serve as a promising VLP dataset, both for understanding and generative tasks. More information can be referred to https://github.com/ksOAn6g5/TaiSu.
源URL	[http://ir.ia.ac.cn/handle/173211/57294]
专题	紫东太初大模型研究中心_大模型计算
通讯作者	Guibo Zhu
作者单位	1.Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University 2.Institute of Automation, Chinese Academy of Sciences 3.Wuhan AI Research 4.School of Artificial Intelligence, University of Chinese Academy of Sciences 5.School of Artificial Intelligence, Beijing Normal University
推荐引用方式 GB/T 7714	Yulong Liu,Guibo Zhu,Bin Zhu,et al. TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training[C]. 见:. New Orleans Convention Center ，America. 2022-11-28至 2022-12-9.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。