Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks
文献类型:会议论文
作者 | Chen, Zhiyang2,4![]() ![]() ![]() ![]() ![]() ![]() |
出版日期 | 2022-11-01 |
会议日期 | 2022-11-28 |
会议地点 | New Orleans, Louisiana & Online |
关键词 | transformer general visual framework sequence prediction multi-task |
英文摘要 | Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to process them with an identical structure. One main obstacle lies in the high-dimensional outputs in object-level visual tasks. In this paper, we propose an object-centric vision framework, Obj2Seq. Obj2Seq takes objects as basic units, and regards most object-level visual tasks as sequence generation problems of objects. Therefore, these visual tasks can be decoupled into two steps. First recognize objects of given categories, and then generate a sequence for each of these objects. The definition of the output sequences varies for different tasks, and the model is supervised by matching these sequences with ground-truth targets. Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks. When experimenting on MS COCO, Obj2Seq achieves 45.7% AP on object detection, 89.0% AP on multi-label classification and 65.0% AP on human pose estimation. These results demonstrate its potential to be generally applied to different visual tasks. |
源URL | [http://ir.ia.ac.cn/handle/173211/56593] ![]() |
专题 | 紫东太初大模型研究中心_大模型计算 |
作者单位 | 1.Peng Cheng Laboratory 2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 3.SenseTime Research 4.School of Artificial Intelligence, University of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Chen, Zhiyang,Zhu, Yousong,Li, Zhaowen,et al. Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks[C]. 见:. New Orleans, Louisiana & Online. 2022-11-28. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。