中国科学院机构知识库网格系统: M3: Modularization for Multi-task and Multi-agent Offline Pre-training

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

M3: Modularization for Multi-task and Multi-agent Offline Pre-training

文献类型：会议论文


作者	Meng Linghui2,3 ; Ruan Jingqing 1,3; Xiong Xuantang 2,3; Li Xiyun1,3 ; Zhang Xi3 ; Xing Dengpeng2,3 ; Xu Bo2,3
出版日期	2023-05
会议日期	2023.5.29-2023.6.2
会议地点	London, United Kingdom
英文摘要	Learning a multi-task policy is crucial in multi-agent reinforcement learning (MARL). Recent work has focused on learning in the context of online multi-task reinforcement learning, where a policy is jointly trained from scratch, aiming to generalize well to few-shot or even zero-shot tasks. However, existing online methods require tremendous interactions and are therefore unsuitable for environments where interactions are expensive. In this work, we novelly introduce the modularization for multi-task and multi-agent offline pre-training (M3) to learn high-level transferable policy representations. We claim that the discrete policy representation is critical for multi-task offline learning and accordingly leverage contexts as a task prompt to enhance the adaptability of pre-trained models to various tasks. To disentangle multiple agents of variation under heterogeneous and non-stationary properties even though they receive the same task, we employ an agent-invariant VQ-VAE to identify each of the multiple agents. We encapsulate the pre-trained model as part of an online MARL algorithm and fine-tune it to improve generalization. We also theoretically analyze the generalization error of our method. We test the proposed method on the challenging StarCraft Multi-Agent Challenge (SMAC) tasks, and empirical results show that it can achieve supreme performance in few-shot or even zero-shot settings across multiple tasks over state-of-the-art MARL methods.
源URL	[http://ir.ia.ac.cn/handle/173211/57333]
专题	数字内容技术与服务研究中心_听觉模型与认知计算
通讯作者	Xing Dengpeng; Xu Bo
作者单位	1.School of Future Technology, University of Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Meng Linghui,Ruan Jingqing,Xiong Xuantang,et al. M3: Modularization for Multi-task and Multi-agent Offline Pre-training[C]. 见:. London, United Kingdom. 2023.5.29-2023.6.2.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。