Analysis of Different Statistical Models for Assessing Potential Distribution of Forest Types in Southern Spain.
Maria Anaya1, Rafael Pino2, Antonio Jordan1, Lorena Martinez-Zavala1, Nicolas Bellinfante1, and Isidoro Gomez1. (1) Dept Cristallography, Mineralogy and Agricultural Chemistry (Univ of Sevilla), Facultad de Quimica, C/Profesor Garcia Gonzalez, 1, Sevilla, 41012, Spain, (2) Dept of Statisthics and Operational Research, Facultad de Matematicas, C/Tarfia, s/n, Sevilla, 41012, Spain
The accomplishment of models to evaluate forest systems using environmental information has been used by different authors in very diverse ways. Nevertheless, the accuracy of the models is very influenced by the quality and number of variables used, as by the mathematical and statistical base used. Following Anaya Romero (2004), the election of the more accurate prediction model is approached in the present work, applied to forest system evaluation. The main goal of this evaluation is to predict the potential distribution of several forest types in Aracena Mountains (Sierra de Aracena Natural Park) and Western Andévalo (Huelva, Spain). In this way, the selected models relate the presence/absence of each forest type to the main parameters influencing the distribution of forest species in the environmental conditions of the study area. The models were selected taking into account their great predictive and explanatory capacity: Logistic Regression (LR), Artificial Neuronal Network (ANN) and Decision Tree (DT), Random Forest, Support Vector Machines (SVM). The forest types in the area are oak forest (Quercus suber and Q. rotundifolia), pine tree forest (Pinus pinaster and P. pinea), eucalyptus forest (Eucalyptus globulus and E. Camaldulensis) and deciduous forest. The selected environmental variables were grouped in several thematic categories: litology (type of rock, acidity and consolidation), geomorphology (erosive processes, mass movements, sedimentation processes and morphogenesis), physiography, relief (elevation, slope and hillslope facing), soil (pH, soil nutrients, organic matter, CEC and clay content) and climate (average summer precipitation, annual average temperature, average temperature of the warmest month and average temperature of the coldest month). After the exploratory analysis of the data, a sampling was made on which the selected models of prediction were applied. With the objective of studying the behaviour of each prediction analysis when new data were used, the whole information was divided in training-data and test-data. The division was made randomly, so that 75% of the samples were classified as training-data, and 25% as test-data. The results obtained by the five prediction models were compared using the Kleinbaum confusion matrix. Finally, DT was the method with a lowest Error Index (0.08 %).