中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping

文献类型:期刊论文

作者Xu, Kai1,4; Wang, Lichun1; Li, Shuang2; Xin, Jianjia3; Yin, Baocai1
刊名MULTIMEDIA SYSTEMS
出版日期2025-10-13
卷号31期号:6页码:12
关键词Robotic grasp Language-conditioned grasping Grasping dataset Cross-modal fusion
ISSN号0942-4962
DOI10.1007/s00530-025-02005-y
英文摘要The performance of robots on the language-conditioned robotic grasping task reflects the intelligence level of robots. However, existing approaches lack the ability to handle implicit instructions and identify infeasible ones, which undermines the intelligence and operational safety of the robot. To overcome the above limitations, this paper introduces a novel Language-conditioned Robotic Grasping Dataset (LRGD), which covers a variety of instruction types. Correspondingly, an end-to-end BLIP-based Cross-modal Grasping Network (BCGN) for language-conditioned grasping is proposed. Specifically, BCGN integrates BLIP to jointly model cross-modal information, and introduces a learnable circuit breaker that enables the model to actively reject infeasible requests. Furthermore, through collaboration with LVLMs (Large Vision-Language Models), BCGN can easily achieve zero-shot recognition of implicit instructions. Experimental results the LRGD and in real-world scenarios demonstrate the effectiveness of BCGN in dealing with instructions of different complexity levels.
资助项目National Natural Science Foundation of China[2021ZD0111902] ; National Key R&D Program of China[62376014] ; National Key R&D Program of China[62172022] ; National Key R&D Program of China[U21B2038] ; National Natural Science Foundation of China[2021JQR023] ; Foundation for China University Industry-University-Research Innovation[KM202411232017] ; R&D Program of Beijing Municipal Education Commission
WOS研究方向Computer Science
语种英语
WOS记录号WOS:001592913400013
出版者SPRINGER
源URL[http://119.78.100.204/handle/2XEOYT63/41656]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Xu, Kai; Wang, Lichun
作者单位1.Beijing Univ Technol, Sch Informat Sci & Technol, Beijing 100124, Peoples R China
2.Beijing Informat Sci & Technol Univ, Sch Automat, Beijing 100192, Peoples R China
3.INSPUR Grp CO LTD, Jinan 250101, Peoples R China
4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Xu, Kai,Wang, Lichun,Li, Shuang,et al. Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping[J]. MULTIMEDIA SYSTEMS,2025,31(6):12.
APA Xu, Kai,Wang, Lichun,Li, Shuang,Xin, Jianjia,&Yin, Baocai.(2025).Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping.MULTIMEDIA SYSTEMS,31(6),12.
MLA Xu, Kai,et al."Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping".MULTIMEDIA SYSTEMS 31.6(2025):12.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。