中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Locate then Segment: A Strong Pipeline for Referring Image Segmentation

文献类型:会议论文

作者Jing Y(荆雅)2,3; Kong T(孔涛)1; Wang W(王威)2,3; Wang L(王亮)2,3; Li L(李磊)1; Tan TN(谭铁牛)2,3
出版日期2021-06
会议日期2021-6
会议地点virtual
英文摘要

Referring image segmentation aims to segment the objects referred by a natural language expression. Previous methods usually focus on designing an implicit and recurrent feature interaction mechanism to fuse the visuallinguistic features to directly generate the final segmentation mask without explicitly modeling the localization information of the referent instances. To tackle these problems, we view this task from another perspective by decoupling it into a "Locate-Then-Segment" (LTS) scheme. Given a language expression, people generally first perform attention to the corresponding target image regions, then generate a
fine segmentation mask about the object based on its context. The LTS first extracts and fuses both visual and textual features to get a cross-modal representation, then applies a cross-model interaction on the visual-textual features to locate the referred object with position prior, and finally generates the segmentation result with a light-weight segmentation network. Our LTS is simple but surprisingly effective. On three popular benchmark datasets, the LTS outperforms all the previous state-of-the-arts methods by a large margin (e.g., +3.2% on RefCOCO+ and +3.4% on RefCOCOg). In addition, our model is more interpretable with explicitly locating the object, which is also proved by visualization experiments. We believe this framework is promising to serve as a strong baseline for referring image segmentation.
 

源URL[http://ir.ia.ac.cn/handle/173211/44447]  
专题自动化研究所_智能感知与计算研究中心
作者单位1.ByteDance AI Lab
2.Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA)
3.School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)
推荐引用方式
GB/T 7714
Jing Y,Kong T,Wang W,et al. Locate then Segment: A Strong Pipeline for Referring Image Segmentation[C]. 见:. virtual. 2021-6.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。