Active learning for data streams under concept drift and concept evolution

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Journal: CEUR Workshop Proceedings

Volume: 2069

ISSN: 1613-0073

Abstract:

Data streams classification is an important problem however, poses many challenges. Since the length of the data is theoretically infinite, it is impractical to store and process all the historical data. Data streams also experience change of its underlying distribution (concept drift), thus the classifier must adapt. Another challenge of data stream classification is the possible emergence and disappearance of classes which is known as (concept evolution) problem. On the top of these challenges, acquiring labels with such large data is expensive. In this paper, we propose a stream-based active learning (AL) strategy (SAL) that handles the aforementioned challenges. SAL aims at querying the labels of samples which results in optimizing the expected future error. It handles concept drift and concept evolution by adapting to the change in the stream. Furthermore, as a part of the error reduction process, SAL handles the sampling bias problem and queries the samples that caused the change i.e., drifted samples or samples coming from new classes. To tackle the lack of prior knowledge about the streaming data, non-parametric Bayesian modelling is adopted namely the two representations of Dirichlet process; Dirichlet mixture models and stick breaking process. Empirical results obtained on real-world benchmarks show the high performance of the proposed SAL method compared to the state-of-the-art methods.

https://eprints.bournemouth.ac.uk/29868/

Source: Scopus

Active Learning for Data Streams under Concept Drift and concept evolution

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Conference: ECML/PKDD 2016 Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV-2016)

Dates: 23 September 2016

https://eprints.bournemouth.ac.uk/29868/

Source: Manual

Active Learning for Data Streams under Concept Drift and concept evolution.

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Conference: ECML/PKDD 2016 Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV-2016)

Abstract:

Data streams classification is an important problem however, poses many challenges. Since the length of the data is theoretically infinite, it is impractical to store and process all the historical data. Data streams also experience change of its underlying dis-tribution (concept drift), thus the classifier must adapt. Another challenge of data stream classification is the possible emergence and disappearance of classes which is known as (concept evolution) problem. On the top of these challenges, acquiring labels with such large data is expensive. In this paper, we propose a stream-based active learning (AL) strategy (SAL) that handles the aforementioned challenges. SAL aims at querying the labels of samples which results in optimizing the expected future error. It handles concept drift and concept evolution by adapting to the change in the stream. Furthermore, as a part of the error reduction process, SAL handles the sampling bias problem and queries the samples that caused the change i.e., drifted samples or samples coming from new classes. To tackle the lack of prior knowledge about the streaming data, non-parametric Bayesian modelling is adopted namely the two representations of Dirichlet process; Dirichlet mixture models and stick breaking process. Empirical results obtained on real-world benchmarks show the high performance of the proposed SAL method compared to the state-of-the-art methods.

https://eprints.bournemouth.ac.uk/29868/

http://www.ecmlpkdd2016.org/downloads/program_booklet.pdf

Source: BURO EPrints