Regression with empirical variable selection: Description of a new method and application to ecological datasets

Authors: Goodenough, A.E., Hart, A.G. and Stafford, R.

Journal: PLoS ONE

Volume: 7

Issue: 3

eISSN: 1932-6203

DOI: 10.1371/journal.pone.0034338

Abstract:

Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset - habitat and offspring quality in the great tit (Parus major) - the optimal REVS model explained more variance (higher R 2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R 2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines. © 2012 Goodenough et al.

https://eprints.bournemouth.ac.uk/24694/

Source: Scopus

Regression with empirical variable selection: description of a new method and application to ecological datasets.

Authors: Goodenough, A.E., Hart, A.G. and Stafford, R.

Journal: PLoS One

Volume: 7

Issue: 3

Pages: e34338

eISSN: 1932-6203

DOI: 10.1371/journal.pone.0034338

Abstract:

Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset--habitat and offspring quality in the great tit (Parus major)--the optimal REVS model explained more variance (higher R(2)), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R(2) values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.

https://eprints.bournemouth.ac.uk/24694/

Source: PubMed

Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets

Authors: Goodenough, A.E., Hart, A.G. and Stafford, R.

Journal: PLOS ONE

Volume: 7

Issue: 3

ISSN: 1932-6203

DOI: 10.1371/journal.pone.0034338

https://eprints.bournemouth.ac.uk/24694/

Source: Web of Science (Lite)

Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets

Authors: Goodenough, A.E., Hart, A.G. and Stafford, R.

Journal: Plos One

Volume: 7

ISSN: 1932-6203

DOI: 10.1371/journal.pone.0034338

https://eprints.bournemouth.ac.uk/24694/

Source: Manual

Preferred by: Rick Stafford

Regression with empirical variable selection: description of a new method and application to ecological datasets.

Authors: Goodenough, A.E., Hart, A.G. and Stafford, R.

Journal: PloS one

Volume: 7

Issue: 3

Pages: e34338

eISSN: 1932-6203

ISSN: 1932-6203

DOI: 10.1371/journal.pone.0034338

Abstract:

Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset--habitat and offspring quality in the great tit (Parus major)--the optimal REVS model explained more variance (higher R(2)), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R(2) values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.

https://eprints.bournemouth.ac.uk/24694/

Source: Europe PubMed Central

Regression with empirical variable selection: description of a new method and application to ecological datasets.

Authors: Goodenough, A.E., Hart, A.G. and Stafford, R.

Journal: PLoS One

Volume: 7

Issue: 3

Pages: e34338

ISSN: 1932-6203

Abstract:

Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset--habitat and offspring quality in the great tit (Parus major)--the optimal REVS model explained more variance (higher R(2)), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R(2) values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.

https://eprints.bournemouth.ac.uk/24694/

Source: BURO EPrints