Pseudo-absences, pseudo-models and pseudo-niches: Pitfalls of model selection based on the area under the curve

Authors: Golicher, D., Ford, A., Cayuela, L. and Newton, A.

Journal: International Journal of Geographical Information Science

Volume: 26

Issue: 11

Pages: 2049-2063

eISSN: 1362-3087

ISSN: 1365-8816

DOI: 10.1080/13658816.2012.719626

Abstract:

The area under the curve (AUC) of the receiver operator characteristic (ROC) graph is regarded as an objective measure of the discrimination accuracy of predictive models. AUC scores calculated from background values, or pseudo-absences, have been proposed as a method of model selection for species distribution models (SDMs) fitted to presence-only data. However, the utility of AUC as a measure of model performance when data on confirmed absence are unavailable has not been fully investigated. We fitted SDMs using informative climatic variables for 2000 species of Mesoamerican trees. As a reference, we also built 'pseudo-models' using Gaussian random fields with no biological meaning. AUC correctly selected SDMs fitted to single environmental variables over 'pseudo-models' fitted to single random fields in almost all cases. However, when all seven variables were included in the models, AUC erroneously selected complex pseudo-models over complex climate models in 17% of the cases. The spatial distribution patterns predicted by the pseudo-models differed from the results derived from climate-based models, even when overall AUC scores were similar. Both model and pseudo-model AUC values increased when presence points were few and spatially aggregated. The results show that AUC calculated from presence-only data can be an unreliable guide for model selection. Pseudo-absences have ill-defined properties that challenge the interpretation of AUC values. Inference on multidimensional niche spaces should not be supported by AUC values calculated using pseudo-absences. © 2012 Copyright Taylor and Francis Group, LLC.

Source: Scopus

Pseudo-absences, pseudo-models and pseudo-niches: pitfalls of model selection based on the area under the curve

Authors: Golicher, D., Ford, A., Cayuela, L. and Newton, A.

Journal: INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Volume: 26

Issue: 11

Pages: 2049-2063

eISSN: 1362-3087

ISSN: 1365-8816

DOI: 10.1080/13658816.2012.719626

Source: Web of Science (Lite)

Pseudo-absences, pseudo-models and pseudo-niches: pitfalls of model selection based on the area under the curve

Authors: Golicher, D., Ford, A., Newton, A. and Cayuela, L.

Editors: Laffan, S.

Journal: International Journal of Geographical Information Science

Publisher: Taylor & Francis

Abstract:

The area under the curve (AUC) of the receiver operator characteristic (ROC) graph is AQ1 regarded as an objective measure of the discrimination accuracy of predictive models.

AUC scores calculated from background values, or pseudo-absences, have been pro- 10 posed as a method of model selection for species distribution models (SDMs) fitted to presence-only data. However, the utility of AUC as a measure of model performance when data on confirmed absence are unavailable has not been fully investigated.We fitted SDMs using informative climatic variables for 2000 species of Mesoamerican trees.

As a reference, we also built ‘pseudo-models’ using Gaussian random fields with no 15 biological meaning. AUC correctly selected SDMs fitted to single environmental variables over ‘pseudo-models’ fitted to single random fields in almost all cases. However, when all seven variables were included in the models, AUC erroneously selected complex pseudo-models over complex climate models in 17% of the cases. The spatial distribution patterns predicted by the pseudo-models differed from the results derived 20 from climate-based models, even when overall AUC scores were similar. Both model and pseudo-model AUC values increased when presence points were few and spatially aggregated. The results show that AUC calculated from presence-only data can be an unreliable guide for model selection. Pseudo-absences have ill-defined properties that challenge the interpretation of AUC values. Inference on multidimensional niche spaces 25 should not be supported by AUC values calculated using pseudo-absences.

Source: Manual

Preferred by: Adrian Newton, Andrew Ford and Duncan Golicher