中国科学院机构知识库网格系统: An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data

An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data

文献类型：期刊论文


作者	Huang, Zongcai 2; Peng, Peng 3; Lu, Feng 3,4,5; Zhang, He 1
刊名	TRANSACTIONS IN GIS
出版日期	2025-02-01
卷号	29 期号:1 页码:e13294
关键词	crowd-sensing large language model prompt fine-tuning technique quality indicators spatiotemporal data
ISSN号	1361-1682
DOI	10.1111/tgis.13294
产权排序	2
文献子类	Article
英文摘要	Knowledge-driven GIS increasingly requires multi-source, multi-type, and multi-model crowd-sensing spatiotemporal data, whose data quality is difficult to guarantee and determine. Hence, extracting quality indicator information, widely present in various unstructured web texts, is crucial to providing supplementary quality information for crowd-sensing spatiotemporal data. Recent advances in large language models show potential in extracting quality indicator information. However, it is still hard to get accurate results from large language models that use different quality indicators for crowd-sensing spatiotemporal data. Therefore, we have designed a large language model that is fine-tuned for the extraction of spatiotemporal quality information from quality description text (LLMFT-STQIE). Firstly, we establish a quality indicator vocabulary to determine whether the text includes quality indicator information from the spatiotemporal data. Then, we create a two-stage prompt model with QILE and QIVE prompts that include input text, task type, instructions, the quality indicator vocabulary, output format, and a reference case. This model is based on the fine-tuning technology of large language models. The results show that our LLMFT-STQIE achieves an accuracy of 91% and a recall rate of 80%, respectively, representing improvements of 23% and 38% compared to untuned large language models. These results further show that the suggested method easily and accurately extracts quality indicator information from web texts for crowd-sensing spatiotemporal data. The study helps investigate strategies for optimizing huge language models for specific scenarios or task specifications.
URL标识	查看原文
WOS研究方向	Geography
语种	英语
WOS记录号	WOS:001396302900001
出版者	WILEY
源URL	[http://ir.igsnrr.ac.cn/handle/311030/211380]
专题	资源与环境信息系统国家重点实验室_外文论文
通讯作者	Peng, Peng
作者单位	1.Natl Qual Inspect & Testing Ctr Surveying & Mappin, Beijing, Peoples R China 2.Xiamen Univ Technol, Xiamen, Peoples R China; 3.Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing, Peoples R China; 4.Fuzhou Univ, Acad Digital China, Fuzhou, Peoples R China; 5.Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China;
推荐引用方式 GB/T 7714	Huang, Zongcai,Peng, Peng,Lu, Feng,et al. An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data[J]. TRANSACTIONS IN GIS,2025,29(1):e13294.
APA	Huang, Zongcai,Peng, Peng,Lu, Feng,&Zhang, He.(2025).An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data.TRANSACTIONS IN GIS,29(1),e13294.
MLA	Huang, Zongcai,et al."An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data".TRANSACTIONS IN GIS 29.1(2025):e13294.

入库方式： OAI收割

来源：地理科学与资源研究所

下载0

An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data

其他版本