Fine-tuning large language models for chemical text mining
文献类型:期刊论文
作者 | Zhang, Wei7,8; Wang, Qinggong6; Kong, Xiangtai7,8; Xiong, Jiacheng7,8; Ni, Shengkun7,8; Cao, Duanhua5,8; Niu, Buying7,8; Chen, Mingan3,4,8; Li, Yameng2; Zhang, Runze7,8 |
刊名 | CHEMICAL SCIENCE
![]() |
出版日期 | 2024-06-07 |
页码 | 12 |
ISSN号 | 2041-6520 |
DOI | 10.1039/d4sc00924j |
通讯作者 | Fu, Zunyun(fuzunyun@simm.ac.cn) ; Zheng, Mingyue(myzheng@simm.ac.cn) |
英文摘要 | Extracting knowledge from complex and diverse chemical texts is a pivotal task for both experimental and computational chemists. The task is still considered to be extremely challenging due to the complexity of the chemical language and scientific literature. This study explored the power of fine-tuned large language models (LLMs) on five intricate chemical text mining tasks: compound entity recognition, reaction role labelling, metal-organic framework (MOF) synthesis information extraction, nuclear magnetic resonance spectroscopy (NMR) data extraction, and the conversion of reaction paragraphs to action sequences. The fine-tuned LLMs demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. For comparison, we guided ChatGPT (GPT-3.5-turbo) and GPT-4 with prompt engineering and fine-tuned GPT-3.5-turbo as well as other open-source LLMs such as Mistral, Llama3, Llama2, T5, and BART. The results showed that the fine-tuned ChatGPT models excelled in all tasks. They achieved exact accuracy levels ranging from 69% to 95% on these tasks with minimal annotated data. They even outperformed those task-adaptive pre-training and fine-tuning models that were based on a significantly larger amount of in-domain data. Notably, fine-tuned Mistral and Llama3 show competitive abilities. Given their versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as flexible and effective toolkits for automated data acquisition could revolutionize chemical knowledge extraction. Extracting knowledge from complex chemical texts is essential for both experimental and computational chemists. Fine-tuned large language models (LLMs) can serve as flexible and effective extractors for automated data acquisition. |
资助项目 | National Natural Science Foundation of China[T2225002] ; National Natural Science Foundation of China[82273855] ; National Natural Science Foundation of China[82204278] ; National Natural Science Foundation of China[2022YFC3400504] ; National Natural Science Foundation of China[E2G805H] ; National Key Research and Development Program of China[2023693] ; Shanghai Post-doctoral Excellence Program ; Shanghai Municipal Science and Technology Major Project |
WOS研究方向 | Chemistry |
语种 | 英语 |
WOS记录号 | WOS:001246293600001 |
出版者 | ROYAL SOC CHEMISTRY |
源URL | [http://119.78.100.183/handle/2S10ELR8/311431] ![]() |
专题 | 新药研究国家重点实验室 |
通讯作者 | Fu, Zunyun; Zheng, Mingyue |
作者单位 | 1.Ludwig Maximilians Univ Munchen, Med Klin & Poliklin 1, Klinikum Univ Munchen, Munich, Germany 2.ProtonUnfold Technol Co Ltd, Suzhou, Peoples R China 3.Lingang Lab, Shanghai 200031, Peoples R China 4.ShanghaiTech Univ, Sch Phys Sci & Technol, Shanghai 201210, Peoples R China 5.Zhejiang Univ, Innovat Inst Artificial Intelligence Med, Coll Pharmaceut Sci, Hangzhou 310058, Zhejiang, Peoples R China 6.Nanjing Univ Chinese Med, 138 Xianlin Rd, Nanjing 210023, Peoples R China 7.Univ Chinese Acad Sci, 19A Yuquan Rd, Beijing 100049, Peoples R China 8.Chinese Acad Sci, Shanghai Inst Mat Med, Drug Discovery & Design Ctr, State Key Lab Drug Res, 555 Zuchongzhi Rd, Shanghai 201203, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Wei,Wang, Qinggong,Kong, Xiangtai,et al. Fine-tuning large language models for chemical text mining[J]. CHEMICAL SCIENCE,2024:12. |
APA | Zhang, Wei.,Wang, Qinggong.,Kong, Xiangtai.,Xiong, Jiacheng.,Ni, Shengkun.,...&Zheng, Mingyue.(2024).Fine-tuning large language models for chemical text mining.CHEMICAL SCIENCE,12. |
MLA | Zhang, Wei,et al."Fine-tuning large language models for chemical text mining".CHEMICAL SCIENCE (2024):12. |
入库方式: OAI收割
来源:上海药物研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。