Density-preserving sampling: Robust and efficient alternative to cross-validation for error estimation

Authors: Budka, M. and Gabrys, B.

Journal: IEEE Transactions on Neural Networks and Learning Systems

Volume: 24

Issue: 1

Pages: 22-34

eISSN: 2162-2388

ISSN: 2162-237X

DOI: 10.1109/TNNLS.2012.2222925

Abstract:

Estimation of the generalization ability of a classification or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density-preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers. © 2012 IEEE.

https://eprints.bournemouth.ac.uk/20876/

Source: Scopus

Density-preserving sampling: robust and efficient alternative to cross-validation for error estimation.

Authors: Budka, M. and Gabrys, B.

Journal: IEEE Trans Neural Netw Learn Syst

Volume: 24

Issue: 1

Pages: 22-34

ISSN: 2162-237X

DOI: 10.1109/TNNLS.2012.2222925

Abstract:

Estimation of the generalization ability of a classification or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density-preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers.

https://eprints.bournemouth.ac.uk/20876/

Source: PubMed

Density-Preserving Sampling: Robust and Efficient Alternative to Cross-Validation for Error Estimation

Authors: Budka, M. and Gabrys, B.

Journal: IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Volume: 24

Issue: 1

Pages: 22-34

eISSN: 2162-2388

ISSN: 2162-237X

DOI: 10.1109/TNNLS.2012.2222925

https://eprints.bournemouth.ac.uk/20876/

Source: Web of Science (Lite)

Density Preserving Sampling: Robust and Efficient Alternative to Cross-validation for Error Estimation

Authors: Budka, M. and Gabrys, B.

Journal: IEEE Transactions on Neural Networks and Learning Systems

Volume: 24

Issue: 1

Pages: 22-34

ISSN: 1045-9227

DOI: 10.1109/TNNLS.2012.2222925

Abstract:

Estimation of the generalization ability of a classi- fication or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density- preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers

https://eprints.bournemouth.ac.uk/20876/

Source: Manual

Preferred by: Marcin Budka

Density-Preserving Sampling: Robust and Efficient Alternative to Cross-Validation for Error Estimation.

Authors: Budka, M. and Gabrys, B.

Journal: IEEE Trans. Neural Networks Learn. Syst.

Volume: 24

Pages: 22-34

DOI: 10.1109/TNNLS.2012.2222925

https://eprints.bournemouth.ac.uk/20876/

Source: DBLP

Density-preserving sampling: robust and efficient alternative to cross-validation for error estimation.

Authors: Budka, M. and Gabrys, B.

Journal: IEEE transactions on neural networks and learning systems

Volume: 24

Issue: 1

Pages: 22-34

eISSN: 2162-2388

ISSN: 2162-237X

DOI: 10.1109/tnnls.2012.2222925

Abstract:

Estimation of the generalization ability of a classification or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density-preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers.

https://eprints.bournemouth.ac.uk/20876/

Source: Europe PubMed Central

Density Preserving Sampling: Robust and Efficient Alternative to Cross-validation for Error Estimation

Authors: Budka, M. and Gabrys, B.

Journal: IEEE Transactions on Neural Networks and Learning Systems

Volume: 24

Issue: 1

Pages: 22-34

ISSN: 1045-9227

Abstract:

Estimation of the generalization ability of a classi- fication or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density- preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers

https://eprints.bournemouth.ac.uk/20876/

Source: BURO EPrints