中国科学院机构知识库网格系统: Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

文献类型：会议论文


作者	Chen, Zhiyang2,4 ; Zhu, Yousong2 ; Li, Zhaowen 2,4; Yang, Fan1,2 ; Li, Wei 3; Wang, Haixin2,4 ; Zhao, Chaoyang2 ; Wu, Liwei 3; Zhao, Rui 3; Wang, Jinqiao1,2,4
出版日期	2022-11-01
会议日期	2022-11-28
会议地点	New Orleans, Louisiana & Online
关键词	transformer general visual framework sequence prediction multi-task
英文摘要	Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to process them with an identical structure. One main obstacle lies in the high-dimensional outputs in object-level visual tasks. In this paper, we propose an object-centric vision framework, Obj2Seq. Obj2Seq takes objects as basic units, and regards most object-level visual tasks as sequence generation problems of objects. Therefore, these visual tasks can be decoupled into two steps. First recognize objects of given categories, and then generate a sequence for each of these objects. The definition of the output sequences varies for different tasks, and the model is supervised by matching these sequences with ground-truth targets. Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks. When experimenting on MS COCO, Obj2Seq achieves 45.7% AP on object detection, 89.0% AP on multi-label classification and 65.0% AP on human pose estimation. These results demonstrate its potential to be generally applied to different visual tasks.
源URL	[http://ir.ia.ac.cn/handle/173211/56593]
专题	紫东太初大模型研究中心_大模型计算
作者单位	1.Peng Cheng Laboratory 2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 3.SenseTime Research 4.School of Artificial Intelligence, University of Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Chen, Zhiyang,Zhu, Yousong,Li, Zhaowen,et al. Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks[C]. 见:. New Orleans, Louisiana & Online. 2022-11-28.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。