Latin Hypercube Sampling for Assessing the Quality of Legacy Soil Data and Optimizing Soil Sampling.
Alex McBratney1, Florence Carre2, and Budiman Minasny1. (1) The University of Sydney, Faculty of Agriculture, Food & Natural Resorces, JRA McMillan Building A05, Sydney, NSW, 2006, Australia, (2) European Commission Joint Research Centre Institute for Environment and Sustainability, Via Fermi, Ispra, Italy
Legacy soil data form an important part of digital soil mapping and are essential for calibration of models predicting soil properties from environmental variables. Legacy soil data arise from traditional soil survey. Methods of soil survey are generally empirical and based on the mental development of the surveyor, correlating soil with underlying geology, landforms, vegetation and air-photo interpretation. There are no statistical criteria for traditional soil sampling, this usually lead to bias in the areas being sampled. The challenge is to use legacy data for large scale mapping (e.g. national or continental) as funds are limited to resample large areas. The problem is then to assess the reliability and quality of the soil database that are mainly populated by traditional soil survey. The next task is if given additional funding for sampling, where to sample. This sampling can be used to improve and validate the prediction model. Latin Hypercube Sampling (LHS) has been proposed as a sampling tool for digital soil mapping (Minasny and McBratney, 2005). LHS is a stratified-random procedure that provides an efficient way of sampling variables from their multivariate distributions. LHS involves sampling n values from the prescribed distribution of each of the variables. The cumulative distribution for each variable is divided into n equiprobable intervals, and a value is selected randomly from each interval. The n values obtained for each variable are then paired with the other variables. This method ensures a full coverage of the range of each variable by maximally stratifying the marginal distribution. We use the principle of LHS to assess the quality of existing soil data and guide us to the area that need to be sampled. First an area is defined and the empirical environmental data layers are identified in a regular grid. The existing soil data is matched with the environmental variables. A procedure is performed to check the occupancy of the survey data in the hypercube of the environmental data space. This is to determine whether legacy soil survey data occupy the hypercube uniformly or there is bias. It also allows us to estimate the probability of an area being surveyed. From this information we can decide which area in the feature space that are not covered, and need to be sampled.