Pseudo-absences, pseudo-models and pseudo-niches: Pitfalls of model selection based on the area under the curve
Authors: Golicher, D., Ford, A., Cayuela, L. and Newton, A.
Journal: International Journal of Geographical Information Science
Volume: 26
Issue: 11
Pages: 2049-2063
eISSN: 1362-3087
ISSN: 1365-8816
DOI: 10.1080/13658816.2012.719626
Abstract:The area under the curve (AUC) of the receiver operator characteristic (ROC) graph is regarded as an objective measure of the discrimination accuracy of predictive models. AUC scores calculated from background values, or pseudo-absences, have been proposed as a method of model selection for species distribution models (SDMs) fitted to presence-only data. However, the utility of AUC as a measure of model performance when data on confirmed absence are unavailable has not been fully investigated. We fitted SDMs using informative climatic variables for 2000 species of Mesoamerican trees. As a reference, we also built 'pseudo-models' using Gaussian random fields with no biological meaning. AUC correctly selected SDMs fitted to single environmental variables over 'pseudo-models' fitted to single random fields in almost all cases. However, when all seven variables were included in the models, AUC erroneously selected complex pseudo-models over complex climate models in 17% of the cases. The spatial distribution patterns predicted by the pseudo-models differed from the results derived from climate-based models, even when overall AUC scores were similar. Both model and pseudo-model AUC values increased when presence points were few and spatially aggregated. The results show that AUC calculated from presence-only data can be an unreliable guide for model selection. Pseudo-absences have ill-defined properties that challenge the interpretation of AUC values. Inference on multidimensional niche spaces should not be supported by AUC values calculated using pseudo-absences. © 2012 Copyright Taylor and Francis Group, LLC.
Source: Scopus
Pseudo-absences, pseudo-models and pseudo-niches: pitfalls of model selection based on the area under the curve
Authors: Golicher, D., Ford, A., Cayuela, L. and Newton, A.
Journal: INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE
Volume: 26
Issue: 11
Pages: 2049-2063
eISSN: 1362-3087
ISSN: 1365-8816
DOI: 10.1080/13658816.2012.719626
Source: Web of Science (Lite)
Pseudo-absences, pseudo-models and pseudo-niches: pitfalls of model selection based on the area under the curve
Authors: Golicher, D., Ford, A., Newton, A. and Cayuela, L.
Editors: Laffan, S.
Journal: International Journal of Geographical Information Science
Publisher: Taylor & Francis
Abstract:The area under the curve (AUC) of the receiver operator characteristic (ROC) graph is AQ1 regarded as an objective measure of the discrimination accuracy of predictive models.
AUC scores calculated from background values, or pseudo-absences, have been pro- 10 posed as a method of model selection for species distribution models (SDMs) fitted to presence-only data. However, the utility of AUC as a measure of model performance when data on confirmed absence are unavailable has not been fully investigated.We fitted SDMs using informative climatic variables for 2000 species of Mesoamerican trees.
As a reference, we also built ‘pseudo-models’ using Gaussian random fields with no 15 biological meaning. AUC correctly selected SDMs fitted to single environmental variables over ‘pseudo-models’ fitted to single random fields in almost all cases. However, when all seven variables were included in the models, AUC erroneously selected complex pseudo-models over complex climate models in 17% of the cases. The spatial distribution patterns predicted by the pseudo-models differed from the results derived 20 from climate-based models, even when overall AUC scores were similar. Both model and pseudo-model AUC values increased when presence points were few and spatially aggregated. The results show that AUC calculated from presence-only data can be an unreliable guide for model selection. Pseudo-absences have ill-defined properties that challenge the interpretation of AUC values. Inference on multidimensional niche spaces 25 should not be supported by AUC values calculated using pseudo-absences.
Source: Manual
Preferred by: Andrew Ford and Duncan Golicher