Reducing spatial data complexity for classification models

Authors: Ruta, D. and Gabrys, B.

Journal: AIP Conference Proceedings

Volume: 963

Issue: 1

Pages: 603-613

eISSN: 1551-7616

ISBN: 9780735404779

ISSN: 0094-243X

DOI: 10.1063/1.2827047

Abstract:

Intelligent data analytics gradually becomes a day-to-day reality of today's businesses. However, despite rapidly increasing storage and computational power current state-of-the-art predictive models still can not handle massive and noisy corporate data warehouses. What is more adaptive and real-time operational environment requires multiple models to be frequently retrained which further hinders their use. Various data reduction techniques ranging from data sampling up to density retention models attempt to address this challenge by capturing a summarised data structure, yet they either do not account for labelled data or degrade the classification performance of the model trained on the condensed dataset. Our response is a proposition of a new general framework for reducing the complexity of labelled data by means of controlled spatial redistribution of class densities in the input space. On the example of Parzen Labelled Data Compressor (PLDC) we demonstrate a simulatory data condensation process directly inspired by the electrostatic field interaction where the data are moved and merged following the attracting and repelling interactions with the other labelled data. The process is controlled by the class density function built on the original data that acts as a class-sensitive potential field ensuring preservation of the original class density distributions, yet allowing data to rearrange and merge joining together their soft class partitions. As a result we achieved a model that reduces the labelled datasets much further than any competitive approaches yet with the maximum retention of the original class densities and hence the classification performance. PLDC leaves the reduced dataset with the soft accumulative class weights allowing for efficient online updates and as shown in a series of experiments if coupled with Parzen Density Classifier (PDC) significantly outperforms competitive data condensation methods in terms of classification performance at the comparable compression levels. © 2007 American Institute of Physics.

https://eprints.bournemouth.ac.uk/8519/

Source: Scopus

Reducing spatial data complexity for classification models

Authors: Ruta, D. and Gabrys, B.

Journal: COMPUTATIONAL METHODS IN SCIENCE AND ENGINEERING VOL 1

Volume: 963

Pages: 603-613

ISBN: 978-0-7354-0477-9

ISSN: 0094-243X

https://eprints.bournemouth.ac.uk/8519/

Source: Web of Science (Lite)

Reducing Spatial Data Complexity for Classification Models

Authors: Ruta, D. and Gabrys, B.

Editors: Maroulis, G. and Simos, T.E.

Volume: 1

Pages: 603-613

Publisher: American Institute of Physics

Place of Publication: Melville, N.Y.

DOI: 10.1063/1.2827047

Abstract:

Intelligent data analytics gradually becomes a day-to-day reality of today's businesses. However, despite rapidly increasing storage and computational power current state-of-the-art predictive models still can not handle massive and noisy corporate data warehouses. What is more adaptive and real-time operational environment requires multiple models to be frequently retrained which fiirther hinders their use. Various data reduction techniques ranging from data sampling up to density retention models attempt to address this challenge by capturing a summarised data structure, yet they either do not account for labelled data or degrade the classification performance of the model trained on the condensed dataset. Our response is a proposition of a new general framework for reducing the complexity of labelled data by means of controlled spatial redistribution of class densities in the input space. On the example of Parzen Labelled Data Compressor (PLDC) we demonstrate a simulatory data condensation process directly inspired by the electrostatic field interaction where the data are moved and merged following the attracting and repelling interactions with the other labelled data. The process is controlled by the class density function built on the original data that acts as a class-sensitive potential field ensuring preservation of the original class density distributions, yet allowing data to rearrange and merge joining together their soft class partitions.

As a result we achieved a model that reduces the labelled datasets much further than any competitive approaches yet with the maximum retention of the original class densities and hence the classification performance. PLDC leaves the reduced dataset with the soft accumulative class weights allowing for efficient online updates and as shown in a series of experiments if coupled with Parzen Density Classifier (PDC) significantly outperforms competitive data condensation methods in terms of classification performance at the comparable compression levels.

https://eprints.bournemouth.ac.uk/8519/

http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=APCPCS000963000001000603000001&idtype=cvips&gifs=yes

Source: Manual

Preferred by: Dymitr Ruta

Reducing Spatial Data Complexity for Classification Models

Authors: Ruta, D. and Gabrys, B.

Editors: Maroulis, G. and Simos, T.E.

Volume: 1

Pages: 603-613

Publisher: American Institute of Physics

Place of Publication: Melville, N.Y.

ISSN: 0094-243X

Abstract:

Intelligent data analytics gradually becomes a day-to-day reality of today's businesses. However, despite rapidly increasing storage and computational power current state-of-the-art predictive models still can not handle massive and noisy corporate data warehouses. What is more adaptive and real-time operational environment requires multiple models to be frequently retrained which fiirther hinders their use. Various data reduction techniques ranging from data sampling up to density retention models attempt to address this challenge by capturing a summarised data structure, yet they either do not account for labelled data or degrade the classification performance of the model trained on the condensed dataset. Our response is a proposition of a new general framework for reducing the complexity of labelled data by means of controlled spatial redistribution of class densities in the input space. On the example of Parzen Labelled Data Compressor (PLDC) we demonstrate a simulatory data condensation process directly inspired by the electrostatic field interaction where the data are moved and merged following the attracting and repelling interactions with the other labelled data. The process is controlled by the class density function built on the original data that acts as a class-sensitive potential field ensuring preservation of the original class density distributions, yet allowing data to rearrange and merge joining together their soft class partitions.

As a result we achieved a model that reduces the labelled datasets much further than any competitive approaches yet with the maximum retention of the original class densities and hence the classification performance. PLDC leaves the reduced dataset with the soft accumulative class weights allowing for efficient online updates and as shown in a series of experiments if coupled with Parzen Density Classifier (PDC) significantly outperforms competitive data condensation methods in terms of classification performance at the comparable compression levels.

https://eprints.bournemouth.ac.uk/8519/

http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=APCPCS000963000001000603000001&idtype=cvips&gifs=yes

Source: BURO EPrints