NMRExtractor: leveraging large language models to construct an experimental NMR database from open-source scientific publications
文献类型:期刊论文
| 作者 | Wang, Qinggong5,6; Zhang, Wei4,5; Chen, Mingan2,3,5; Li, Xutong4,5; Xiong, Zhaoping1; Xiong, Jiacheng5; Fu, Zunyun3; Zheng, Mingyue4,5,6
|
| 刊名 | CHEMICAL SCIENCE
![]() |
| 出版日期 | 2025-06-25 |
| 卷号 | 16期号:25页码:11548-11558 |
| ISSN号 | 2041-6520 |
| DOI | 10.1039/d4sc08802f |
| 英文摘要 | Nuclear magnetic resonance (NMR) spectroscopy is crucial for elucidating molecular structures, but NMR data extraction remains largely manual and time-consuming. We developed NMRExtractor, a locally deployable tool using a fine-tuned large language model, to address this challenge. By processing 5 734 869 open-source scientific publications, we created NMRBank, a dataset containing 225 809 entries with compound IUPAC names, NMR conditions, 1H and 13C NMR chemical shifts, data confidence levels, and reference information. Our analysis reveals that NMRBank's chemical space significantly surpasses existing public NMR datasets. The extraction process is highly scalable, allowing automatic processing of new research papers and continuous updates to NMRBank. This approach not only expands the available open NMR data space but also provides a foundation for AI-based NMR predictions and related chemical research. By automating data extraction and creating a comprehensive, regularly updated NMR database, NMRExtractor and NMRBank address the scarcity of publicly available experimental NMR data, potentially accelerating progress in various fields of chemical research. |
| 资助项目 | National Natural Science Foundation of China[T2225002] ; National Natural Science Foundation of China[82273855] ; National Natural Science Foundation of China[82204278] ; National Key Research and Development Program of China[2022YFC3400504] ; SIMM-SHUTCM Traditional Chinese Medicine Innovation Joint Research Program[E2G805H] ; Shanghai Post-doctoral Excellence Program[2023693] ; Shanghai Municipal Science and Technology Major Project ; [2024707] |
| WOS研究方向 | Chemistry |
| 语种 | 英语 |
| WOS记录号 | WOS:001497386600001 |
| 出版者 | ROYAL SOC CHEMISTRY |
| 源URL | [http://119.78.100.183/handle/2S10ELR8/318165] ![]() |
| 专题 | 新药研究国家重点实验室 |
| 通讯作者 | Xiong, Jiacheng; Fu, Zunyun; Zheng, Mingyue |
| 作者单位 | 1.ProtonUnfold Technol Co Ltd, Suzhou, Peoples R China 2.Lingang Lab, Shanghai 200031, Peoples R China 3.ShanghaiTech Univ, Shanghai 201210, Peoples R China 4.Univ Chinese Acad Sci, 19A Yuquan Rd, Beijing 100049, Peoples R China 5.Chinese Acad Sci, Shanghai Inst Mat Med, Drug Discovery & Design Ctr, State Key Lab Drug Res, 555 Zuchongzhi Rd, Shanghai 201203, Peoples R China 6.Nanjing Univ Chinese Med, 138 Xianlin Rd, Nanjing 210023, Peoples R China |
| 推荐引用方式 GB/T 7714 | Wang, Qinggong,Zhang, Wei,Chen, Mingan,et al. NMRExtractor: leveraging large language models to construct an experimental NMR database from open-source scientific publications[J]. CHEMICAL SCIENCE,2025,16(25):11548-11558. |
| APA | Wang, Qinggong.,Zhang, Wei.,Chen, Mingan.,Li, Xutong.,Xiong, Zhaoping.,...&Zheng, Mingyue.(2025).NMRExtractor: leveraging large language models to construct an experimental NMR database from open-source scientific publications.CHEMICAL SCIENCE,16(25),11548-11558. |
| MLA | Wang, Qinggong,et al."NMRExtractor: leveraging large language models to construct an experimental NMR database from open-source scientific publications".CHEMICAL SCIENCE 16.25(2025):11548-11558. |
入库方式: OAI收割
来源:上海药物研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


