A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data
文献类型:期刊论文
作者 | Zhang, Lei1,4![]() ![]() ![]() |
刊名 | GEODERMA
![]() |
出版日期 | 2021-02-15 |
卷号 | 384页码:10 |
关键词 | Digital soil sampling Machine learning Semi-supervised learning Self-training Predictive mapping |
ISSN号 | 0016-7061 |
DOI | 10.1016/j.geoderma.2020.114809 |
通讯作者 | Yang, Lin(yanglin@nju.edu.cn) |
英文摘要 | Numerous machine learning models have been developed for constructing the relationship between soil classes or properties and its environmental covariates in digital soil mapping (DSM). Most machine learning models are trained with a supervised learning (SL) method based on training samples. However, the collected sample data is often limited in practice due to that field sampling is expensive and time-consuming. The insufficient samples may limit the learning ability of the model to a large extent. Semi-supervised machine learning, a new machine learning paradigm that makes use of both unsampled data and a small amount of sampled data in the learning process, can be a potential effective method for DSM. In this study, we present a self-training semi-supervised learning (SSL) method for DSM. Different with the SL method for machine learning models, the SSL method not only utilizes the sampled locations but also the abundant environmental covariate information at the unvisited locations. Its basic idea is to iteratively enlarge the training data set by adding the unsampled points with high prediction confidence from the unvisited locations until a stopping criterion reached. The proposed SSL method was applied in machine learning models for predicting soil classes in Heshan Farm of Nenjiang County in Heilongjiang Province, China. Three machine learning models, including multinomial logistic regression (MLR), k-nearest neighbor (KNN) and random forest (RF), were selected to evaluate the efficiency of the SSL method. The entropy threshold was an important parameter in the SSL method, and a sensitivity analysis on this parameter was conducted with using a series of entropy thresholds. The SSL method was compared with the SL method for the three machine learning models for soil prediction. A cross-validation was employed to evaluate the accuracy of the predicted soil class maps generated based on each method. The results showed that the prediction accuracies (the proportion of the correctly predicted samples over the total number of validation samples) of the SSL method were higher than those of the SL method for MLR, KNN, and RF by 5.9%, 12.2%, and 6.0%, respectively. RF-SSL was the most accurate model in the study area, followed by KNN-SSL. Meanwhile, the self-training SSL method for the KNN model had the largest improvement comparing with the other two models. Furthermore, the predicted soil maps using the SSL method showed a more reasonable spatial variation pattern of soil classes. In the study area, a suitable value of the entropy threshold was 0.8 similar to 1.0. We concluded that the SSL method improved the soil prediction accuracy compared with the SL method when applying machine learning models for DSM, and thus is a potential efficient method for DSM with limit sample data. |
WOS关键词 | SPATIAL PREDICTION ; RANDOM FORESTS ; REGRESSION ; CLASSIFICATION ; RESOLUTION ; LANDSCAPE ; REGION ; STOCKS ; MAP |
资助项目 | National Natural Science Foundation of China[41971054] ; National Natural Science Foundation of China[41530749] ; National Natural Science Foundation of China[41871300] |
WOS研究方向 | Agriculture |
语种 | 英语 |
WOS记录号 | WOS:000594244300014 |
出版者 | ELSEVIER |
资助机构 | National Natural Science Foundation of China |
源URL | [http://ir.igsnrr.ac.cn/handle/311030/136524] ![]() |
专题 | 中国科学院地理科学与资源研究所 |
通讯作者 | Yang, Lin |
作者单位 | 1.Nanjing Univ, Sch Geog & Ocean Sci, Nanjing 210023, Peoples R China 2.Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China 3.Nanjing Normal Univ, Sch Geog, Nanjing 210023, Peoples R China 4.Jiangsu Ctr Collaborat Innovat Geog Informat Reso, Nanjing 210023, Peoples R China 5.Nanjing Normal Univ, Minist Educ, Key Lab Virtual Geog Environm, Nanjing 210023, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Lei,Yang, Lin,Ma, Tianwu,et al. A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data[J]. GEODERMA,2021,384:10. |
APA | Zhang, Lei,Yang, Lin,Ma, Tianwu,Shen, Feixue,Cai, Yanyan,&Zhou, Chenghu.(2021).A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data.GEODERMA,384,10. |
MLA | Zhang, Lei,et al."A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data".GEODERMA 384(2021):10. |
入库方式: OAI收割
来源:地理科学与资源研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。