中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data

文献类型:期刊论文

作者Huang, Zongcai2; Peng, Peng3; Lu, Feng3,4,5; Zhang, He1
刊名TRANSACTIONS IN GIS
出版日期2025-02-01
卷号29期号:1页码:e13294
关键词crowd-sensing large language model prompt fine-tuning technique quality indicators spatiotemporal data
ISSN号1361-1682
DOI10.1111/tgis.13294
产权排序2
文献子类Article
英文摘要Knowledge-driven GIS increasingly requires multi-source, multi-type, and multi-model crowd-sensing spatiotemporal data, whose data quality is difficult to guarantee and determine. Hence, extracting quality indicator information, widely present in various unstructured web texts, is crucial to providing supplementary quality information for crowd-sensing spatiotemporal data. Recent advances in large language models show potential in extracting quality indicator information. However, it is still hard to get accurate results from large language models that use different quality indicators for crowd-sensing spatiotemporal data. Therefore, we have designed a large language model that is fine-tuned for the extraction of spatiotemporal quality information from quality description text (LLMFT-STQIE). Firstly, we establish a quality indicator vocabulary to determine whether the text includes quality indicator information from the spatiotemporal data. Then, we create a two-stage prompt model with QILE and QIVE prompts that include input text, task type, instructions, the quality indicator vocabulary, output format, and a reference case. This model is based on the fine-tuning technology of large language models. The results show that our LLMFT-STQIE achieves an accuracy of 91% and a recall rate of 80%, respectively, representing improvements of 23% and 38% compared to untuned large language models. These results further show that the suggested method easily and accurately extracts quality indicator information from web texts for crowd-sensing spatiotemporal data. The study helps investigate strategies for optimizing huge language models for specific scenarios or task specifications.
URL标识查看原文
WOS研究方向Geography
语种英语
WOS记录号WOS:001396302900001
出版者WILEY
源URL[http://ir.igsnrr.ac.cn/handle/311030/211380]  
专题资源与环境信息系统国家重点实验室_外文论文
通讯作者Peng, Peng
作者单位1.Natl Qual Inspect & Testing Ctr Surveying & Mappin, Beijing, Peoples R China
2.Xiamen Univ Technol, Xiamen, Peoples R China;
3.Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing, Peoples R China;
4.Fuzhou Univ, Acad Digital China, Fuzhou, Peoples R China;
5.Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China;
推荐引用方式
GB/T 7714
Huang, Zongcai,Peng, Peng,Lu, Feng,et al. An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data[J]. TRANSACTIONS IN GIS,2025,29(1):e13294.
APA Huang, Zongcai,Peng, Peng,Lu, Feng,&Zhang, He.(2025).An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data.TRANSACTIONS IN GIS,29(1),e13294.
MLA Huang, Zongcai,et al."An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data".TRANSACTIONS IN GIS 29.1(2025):e13294.

入库方式: OAI收割

来源:地理科学与资源研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。