Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques
Authors: Alarab, I. and Prakoonwit, S.
Journal: Data Science and Management
Volume: 5
Issue: 2
Pages: 66-76
eISSN: 2666-7649
DOI: 10.1016/j.dsm.2022.04.003
Abstract:Cryptocurrency blockchain data encounter a class-imbalance problem due to only a few known labels of illicit or fraudulent activities in the blockchain network. For this purpose, we seek to compare various resampling methods applied to two highly imbalanced datasets derived from the blockchain of Bitcoin and Ethereum after further dimensionality reductions, which is different from previous studies on these datasets. Firstly, we study the performance of various classical supervised learning methods to classify illicit transactions or accounts on Bitcoin or Ethereum datasets, respectively. Consequently, we apply various resampling techniques to these datasets using the best performing learning algorithm on each of these datasets. Subsequently, we study the feature importance of the given models, wherein the resampled datasets directly influenced on the explainability of the model. Our main finding is that undersampling using the edited nearest-neighbour technique has attained an accuracy of more than 99% on the given datasets by removing the noisy data points from the whole dataset. Moreover, the best-performing learning algorithms have shown superior performance after feature reduction on these datasets in comparison to their original studies. The matchless contribution lies in discussing the effect of the data resampling on feature importance which is interconnected with explainable artificial intelligence (XAI) techniques.
https://eprints.bournemouth.ac.uk/37046/
Source: Scopus
Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques
Authors: Alarab, I. and Prakoonwit, S.
Journal: Data Science and Management
https://eprints.bournemouth.ac.uk/37046/
https://www.sciencedirect.com/science/article/pii/S2666764922000145
Source: Manual
Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques
Authors: Alarab, I. and Prakoonwit, S.
Journal: Data Science and Management
Volume: 5
Issue: 2
Pages: 66-76
ISSN: 2666-7649
Abstract:Cryptocurrency blockchain data encounters a class-imbalance problem due to only a few known labels of illicit or fraudulent activities in the blockchain network. For this purpose, we seek to provide a comparison of various resampling methods applied to two highly imbalanced datasets derived from the blockchain of Bitcoin and Ethereum after further dimensionality reductions, unlike previous studies on these datasets. Firstly, we study the performance of various classical supervised learning methods to classify illicit transactions/accounts on Bitcoin/Ethereum datasets, respectively. Consequently, we apply a variety of resampling techniques to these datasets using the best performing learning algorithm on each of these datasets. Subsequently, we study the feature importance of the given models, wherein the resampled datasets have revealed a direct influence on the explainability of the model. Our main finding is that undersampling using the edited nearest-neighbour technique has attained an accuracy of more than 99% on the given datasets by removing the noisy data points from the whole dataset. Moreover, the best-performing learning algorithms have shown superior performance after feature reduction on these datasets in comparison to their original studies. The matchless contribution lies in discussing the effect of the data resampling on feature importance which is interconnected with explainable artificial intelligence techniques.
https://eprints.bournemouth.ac.uk/37046/
Source: BURO EPrints