Active learning for classifying data streams with unknown number of classes

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Journal: Neural Networks

Volume: 98

Pages: 1-15

eISSN: 1879-2782

ISSN: 0893-6080

DOI: 10.1016/j.neunet.2017.10.004

Abstract:

The classification of data streams is an interesting but also a challenging problem. A data stream may grow infinitely making it impractical for storage prior to processing and classification. Due to its dynamic nature, the underlying distribution of the data stream may change over time resulting in the so-called concept drift or the possible emergence and fading of classes, known as concept evolution. In addition, acquiring labels of data samples in a stream is admittedly expensive if not infeasible at all. In this paper, we propose a novel stream-based active learning algorithm (SAL) which is capable of coping with both concept drift and concept evolution by adapting the classification model to the dynamic changes in the stream. SAL is the first AL algorithm in the literature to explicitly take account of these concepts. Moreover, using SAL, only labels of samples that are expected to reduce the expected future error are queried. This process is done while tackling the problem of sampling bias so that samples that induce the change (i.e., drifting samples or samples coming from new classes) are queried. To efficiently implement SAL, the paper proposes the application of non-parametric Bayesian models allowing to cope with the lack of prior knowledge about the data stream. In particular, Dirichlet mixture models and the stick breaking process are adopted and adapted to meet the requirements of online learning. The empirical results obtained on real-world benchmarks demonstrate the superiority of SAL in terms of classification performance over the state-of-the-art methods using average and average class accuracy.

https://eprints.bournemouth.ac.uk/29869/

Source: Scopus

Active learning for classifying data streams with unknown number of classes.

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Journal: Neural Netw

Volume: 98

Pages: 1-15

eISSN: 1879-2782

DOI: 10.1016/j.neunet.2017.10.004

Abstract:

The classification of data streams is an interesting but also a challenging problem. A data stream may grow infinitely making it impractical for storage prior to processing and classification. Due to its dynamic nature, the underlying distribution of the data stream may change over time resulting in the so-called concept drift or the possible emergence and fading of classes, known as concept evolution. In addition, acquiring labels of data samples in a stream is admittedly expensive if not infeasible at all. In this paper, we propose a novel stream-based active learning algorithm (SAL) which is capable of coping with both concept drift and concept evolution by adapting the classification model to the dynamic changes in the stream. SAL is the first AL algorithm in the literature to explicitly take account of these concepts. Moreover, using SAL, only labels of samples that are expected to reduce the expected future error are queried. This process is done while tackling the problem of sampling bias so that samples that induce the change (i.e., drifting samples or samples coming from new classes) are queried. To efficiently implement SAL, the paper proposes the application of non-parametric Bayesian models allowing to cope with the lack of prior knowledge about the data stream. In particular, Dirichlet mixture models and the stick breaking process are adopted and adapted to meet the requirements of online learning. The empirical results obtained on real-world benchmarks demonstrate the superiority of SAL in terms of classification performance over the state-of-the-art methods using average and average class accuracy.

https://eprints.bournemouth.ac.uk/29869/

Source: PubMed

Active learning for classifying data streams with unknown number of classes

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Journal: NEURAL NETWORKS

Volume: 98

Pages: 1-15

eISSN: 1879-2782

ISSN: 0893-6080

DOI: 10.1016/j.neunet.2017.10.004

https://eprints.bournemouth.ac.uk/29869/

Source: Web of Science (Lite)

Active Learning for Classifying Data Streams with Unknown Number of Classes

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Journal: Neural Networks

Publisher: Pergamon Press Ltd.

ISSN: 0893-6080

https://eprints.bournemouth.ac.uk/29869/

Source: Manual

Active learning for classifying data streams with unknown number of classes.

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Journal: Neural networks : the official journal of the International Neural Network Society

Volume: 98

Pages: 1-15

eISSN: 1879-2782

ISSN: 0893-6080

DOI: 10.1016/j.neunet.2017.10.004

Abstract:

The classification of data streams is an interesting but also a challenging problem. A data stream may grow infinitely making it impractical for storage prior to processing and classification. Due to its dynamic nature, the underlying distribution of the data stream may change over time resulting in the so-called concept drift or the possible emergence and fading of classes, known as concept evolution. In addition, acquiring labels of data samples in a stream is admittedly expensive if not infeasible at all. In this paper, we propose a novel stream-based active learning algorithm (SAL) which is capable of coping with both concept drift and concept evolution by adapting the classification model to the dynamic changes in the stream. SAL is the first AL algorithm in the literature to explicitly take account of these concepts. Moreover, using SAL, only labels of samples that are expected to reduce the expected future error are queried. This process is done while tackling the problem of sampling bias so that samples that induce the change (i.e., drifting samples or samples coming from new classes) are queried. To efficiently implement SAL, the paper proposes the application of non-parametric Bayesian models allowing to cope with the lack of prior knowledge about the data stream. In particular, Dirichlet mixture models and the stick breaking process are adopted and adapted to meet the requirements of online learning. The empirical results obtained on real-world benchmarks demonstrate the superiority of SAL in terms of classification performance over the state-of-the-art methods using average and average class accuracy.

https://eprints.bournemouth.ac.uk/29869/

Source: Europe PubMed Central

Active Learning for Classifying Data Streams with Unknown Number of Classes.

Authors: Mohamad, S., Sayed-Mouchaweh, M. and Bouchachia, A.

Journal: Neural Networks

Volume: 98

Issue: February

Pages: 1-15

ISSN: 0893-6080

Abstract:

The classification of data streams is an interesting but also a challenging problem. A data stream may grow infinitely making it impractical for storage prior to processing and classification. Due to its dynamic nature, the underlying distribution of the data stream may change over time resulting in the so-called concept drift or the possible emergence and fading of classes, known as concept evolution. In addition, acquiring labels of data samples in a stream is admittedly expensive if not infeasible at all. In this paper, we propose a novel stream-based active learning algorithm (SAL) which is capable of coping with both concept drift and concept evolution by adapting the classification model to the dynamic changes in the stream. SAL is the first AL algorithm in the literature to explicitly take account of these concepts. Moreover, using SAL, only labels of samples that are expected to reduce the expected future error are queried. This process is done while tackling the problem of sampling bias so that samples that induce the change (i.e., drifting samples or samples coming from new classes) are queried. To efficiently implement SAL, the paper proposes the application of non parametric Bayesian models allowing to cope with the lack of prior knowledge about the data stream. In particular, Dirichlet mixture models and the stick breaking process are adopted and adapted to meet the requirements of online learning. The empirical results obtained on real-world benchmarks demonstrate the superiority of SAL in terms of classification performance over the state-of-the-art methods using average and average class accuracy.

https://eprints.bournemouth.ac.uk/29869/

Source: BURO EPrints