An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data
文献类型:期刊论文
作者 | Huang, Zongcai2; Peng, Peng3; Lu, Feng3,4,5; Zhang, He1 |
刊名 | TRANSACTIONS IN GIS
![]() |
出版日期 | 2025-02-01 |
卷号 | 29期号:1页码:e13294 |
关键词 | crowd-sensing large language model prompt fine-tuning technique quality indicators spatiotemporal data |
ISSN号 | 1361-1682 |
DOI | 10.1111/tgis.13294 |
产权排序 | 2 |
文献子类 | Article |
英文摘要 | Knowledge-driven GIS increasingly requires multi-source, multi-type, and multi-model crowd-sensing spatiotemporal data, whose data quality is difficult to guarantee and determine. Hence, extracting quality indicator information, widely present in various unstructured web texts, is crucial to providing supplementary quality information for crowd-sensing spatiotemporal data. Recent advances in large language models show potential in extracting quality indicator information. However, it is still hard to get accurate results from large language models that use different quality indicators for crowd-sensing spatiotemporal data. Therefore, we have designed a large language model that is fine-tuned for the extraction of spatiotemporal quality information from quality description text (LLMFT-STQIE). Firstly, we establish a quality indicator vocabulary to determine whether the text includes quality indicator information from the spatiotemporal data. Then, we create a two-stage prompt model with QILE and QIVE prompts that include input text, task type, instructions, the quality indicator vocabulary, output format, and a reference case. This model is based on the fine-tuning technology of large language models. The results show that our LLMFT-STQIE achieves an accuracy of 91% and a recall rate of 80%, respectively, representing improvements of 23% and 38% compared to untuned large language models. These results further show that the suggested method easily and accurately extracts quality indicator information from web texts for crowd-sensing spatiotemporal data. The study helps investigate strategies for optimizing huge language models for specific scenarios or task specifications. |
URL标识 | 查看原文 |
WOS研究方向 | Geography |
语种 | 英语 |
WOS记录号 | WOS:001396302900001 |
出版者 | WILEY |
源URL | [http://ir.igsnrr.ac.cn/handle/311030/211380] ![]() |
专题 | 资源与环境信息系统国家重点实验室_外文论文 |
通讯作者 | Peng, Peng |
作者单位 | 1.Natl Qual Inspect & Testing Ctr Surveying & Mappin, Beijing, Peoples R China 2.Xiamen Univ Technol, Xiamen, Peoples R China; 3.Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing, Peoples R China; 4.Fuzhou Univ, Acad Digital China, Fuzhou, Peoples R China; 5.Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China; |
推荐引用方式 GB/T 7714 | Huang, Zongcai,Peng, Peng,Lu, Feng,et al. An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data[J]. TRANSACTIONS IN GIS,2025,29(1):e13294. |
APA | Huang, Zongcai,Peng, Peng,Lu, Feng,&Zhang, He.(2025).An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data.TRANSACTIONS IN GIS,29(1),e13294. |
MLA | Huang, Zongcai,et al."An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data".TRANSACTIONS IN GIS 29.1(2025):e13294. |
入库方式: OAI收割
来源:地理科学与资源研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。