中国科学院机构知识库网格系统: Training Large Language Models to Follow System Prompt with Self-Supervised Fine-Tuning

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Training Large Language Models to Follow System Prompt with Self-Supervised Fine-Tuning

文献类型：会议论文


作者	Junyan Qiu2,3 ; Haitao Wang1 ; Yiping Yang2
出版日期	2024-03
会议日期	2024-07
会议地点	YOKOHAMA, JAPAN
关键词	large language models supervised fine-tuning instruct tuning stylized generation
英文摘要	In the realm of artificial intelligence, system prompts stand as directives or requests aimed at guiding systems, such as programming environments or AI models, to execute specific tasks or operations. Typically positioned at the commencement of input sequences in large language models, these prompts play a pivotal role in shaping the model’s response and guiding its interaction flow. However, a notable challenge emerges during multi-turn dialogues, where these models gradually diverge from adhering to the initial system prompt, leading to inconsistencies in the dialogue. In this paper, we present a scalable framework facilitating the adherence of language models to system prompts through automated data construction. Our approach, termed SELF-SUPERVISED SYSTEM PROMPT FINE-TUNING (S3FT), be- gins by prompting a language model to modify real dialogue responses to fit a specific system prompt, using stylized transla- tion. Subsequently, we select a small sample of these responses for human preference annotation. This annotated data is utilized to train the language model to act as a discriminator, identi- fying high-quality examples that are then employed in further supervised fine-tuning. Experimental results on several datasets demonstrate that applying our method to LlaMA2 and ChatGLM promotes human preference rates by over 50%, and outperforms ChatGPT and GPT4 by a consideratble margin. The source code of our paper is available in S3FT-repo.
会议录出版者	IEEE
源URL	[http://ir.ia.ac.cn/handle/173211/57413]
专题	综合信息系统研究中心_视知觉融合及其应用
通讯作者	Junyan Qiu
作者单位	1.Meituan 2.Institute of Automation, Chinese Academy of Sciences 3.University of Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Junyan Qiu,Haitao Wang,Yiping Yang. Training Large Language Models to Follow System Prompt with Self-Supervised Fine-Tuning[C]. 见:. YOKOHAMA, JAPAN. 2024-07.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。