Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques

Authors: Alarab, I. and Prakoonwit, S.

Journal: Data Science and Management

Volume: 5

Issue: 2

Pages: 66-76

eISSN: 2666-7649

DOI: 10.1016/j.dsm.2022.04.003

Abstract:

Cryptocurrency blockchain data encounter a class-imbalance problem due to only a few known labels of illicit or fraudulent activities in the blockchain network. For this purpose, we seek to compare various resampling methods applied to two highly imbalanced datasets derived from the blockchain of Bitcoin and Ethereum after further dimensionality reductions, which is different from previous studies on these datasets. Firstly, we study the performance of various classical supervised learning methods to classify illicit transactions or accounts on Bitcoin or Ethereum datasets, respectively. Consequently, we apply various resampling techniques to these datasets using the best performing learning algorithm on each of these datasets. Subsequently, we study the feature importance of the given models, wherein the resampled datasets directly influenced on the explainability of the model. Our main finding is that undersampling using the edited nearest-neighbour technique has attained an accuracy of more than 99% on the given datasets by removing the noisy data points from the whole dataset. Moreover, the best-performing learning algorithms have shown superior performance after feature reduction on these datasets in comparison to their original studies. The matchless contribution lies in discussing the effect of the data resampling on feature importance which is interconnected with explainable artificial intelligence (XAI) techniques.

https://eprints.bournemouth.ac.uk/37046/

Source: Scopus

Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques

Authors: Alarab, I. and Prakoonwit, S.

Journal: Data Science and Management

https://eprints.bournemouth.ac.uk/37046/

https://www.sciencedirect.com/science/article/pii/S2666764922000145

Source: Manual

Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques

Authors: Alarab, I. and Prakoonwit, S.

Journal: Data Science and Management

Volume: 5

Issue: 2

Pages: 66-76

ISSN: 2666-7649

Abstract:

Cryptocurrency blockchain data encounters a class-imbalance problem due to only a few known labels of illicit or fraudulent activities in the blockchain network. For this purpose, we seek to provide a comparison of various resampling methods applied to two highly imbalanced datasets derived from the blockchain of Bitcoin and Ethereum after further dimensionality reductions, unlike previous studies on these datasets. Firstly, we study the performance of various classical supervised learning methods to classify illicit transactions/accounts on Bitcoin/Ethereum datasets, respectively. Consequently, we apply a variety of resampling techniques to these datasets using the best performing learning algorithm on each of these datasets. Subsequently, we study the feature importance of the given models, wherein the resampled datasets have revealed a direct influence on the explainability of the model. Our main finding is that undersampling using the edited nearest-neighbour technique has attained an accuracy of more than 99% on the given datasets by removing the noisy data points from the whole dataset. Moreover, the best-performing learning algorithms have shown superior performance after feature reduction on these datasets in comparison to their original studies. The matchless contribution lies in discussing the effect of the data resampling on feature importance which is interconnected with explainable artificial intelligence techniques.

https://eprints.bournemouth.ac.uk/37046/

Source: BURO EPrints