A New Generative Adversarial Network for Improving Classification Performance for Imbalanced Data

Authors: Strelcenia, D.E.

Conference: Bournemouth University, Faculty of Science and Technology

Abstract:

Data is a common issue in many industries, particularly in fields such as fraud detection and medical diagnosis. Imbalanced data refers to datasets where the distribution of classes is not equal, resulting in an over- representation of one class and an under-representation of another. This can lead to biassed and inaccurate machine learning models, as the algorithm may be inclined to favour the majority class and overlook important patterns in the minority class. Various sectors have utilised deep neural networks for data synthesis. However, according to research papers in these fields, balanced data outperforms imbalanced data when it comes to deep neural networks. Although deep generative approaches, such as Generative Adversarial Networks (GANs), are an efficient method of augmenting high-dimensional data, there is a lack of research on their effectiveness with credit card or breast cancer data and the current methods demonstrate limitations. Our research focuses on obtaining a great number of sets of data that are valid and resemble the minority class, in this case, fraudulent or malignant samples. Having more data like this can be used to train a binary classifier so it's effective against fraud or cancer diagnosis. To overcome challenges opposed to existing methods we have developed a novel GAN-based method called K-CGAN, which has been tested on credit card fraud and breast cancer data. K- CGAN is designed to generate synthetic data that resembles the minority class, effectively balancing the dataset and improving the performance of binary classifiers. Our research demonstrates the effectiveness of K-CGAN in handling complex data imbalance problems often encountered in practical applications. In addition, the experiments performed on different datasets indicate that K-CGAN can be used for various purposes. The application of machine learning algorithms in various industries has become increasingly popular in recent years. However, the quality and quantity of available data are crucial factors that directly impact the accuracy and reliability of these models. The scarcity and imbalance of datasets in certain domains pose challenges for researchers and practitioners, and the need for effective solutions is more pressing than ever. In this context, K- CGAN provides a promising approach to address data imbalance and improve the performance of machine learning models. Our results show that K-CGAN can be applied to different datasets with different characteristics, making it a valuable tool for data scientists and practitioners in various fields.

https://eprints.bournemouth.ac.uk/39677/

Source: Manual

A New Generative Adversarial Network for Improving Classification Performance for Imbalanced Data

Authors: Strelcenia, E.

Conference: Bournemouth University

Abstract:

Data is a common issue in many industries, particularly in fields such as fraud detection and medical diagnosis. Imbalanced data refers to datasets where the distribution of classes is not equal, resulting in an over- representation of one class and an under-representation of another. This can lead to biassed and inaccurate machine learning models, as the algorithm may be inclined to favour the majority class and overlook important patterns in the minority class. Various sectors have utilised deep neural networks for data synthesis. However, according to research papers in these fields, balanced data outperforms imbalanced data when it comes to deep neural networks. Although deep generative approaches, such as Generative Adversarial Networks (GANs), are an efficient method of augmenting high-dimensional data, there is a lack of research on their effectiveness with credit card or breast cancer data and the current methods demonstrate limitations. Our research focuses on obtaining a great number of sets of data that are valid and resemble the minority class, in this case, fraudulent or malignant samples. Having more data like this can be used to train a binary classifier so it's effective against fraud or cancer diagnosis. To overcome challenges opposed to existing methods we have developed a novel GAN-based method called K-CGAN, which has been tested on credit card fraud and breast cancer data. K- CGAN is designed to generate synthetic data that resembles the minority class, effectively balancing the dataset and improving the performance of binary classifiers. Our research demonstrates the effectiveness of K-CGAN in handling complex data imbalance problems often encountered in practical applications. In addition, the experiments performed on different datasets indicate that K-CGAN can be used for various purposes. The application of machine learning algorithms in various industries has become increasingly popular in recent years. However, the quality and quantity of available data are crucial factors that directly impact the accuracy and reliability of these models. The scarcity and imbalance of datasets in certain domains pose challenges for researchers and practitioners, and the need for effective solutions is more pressing than ever. In this context, K- CGAN provides a promising approach to address data imbalance and improve the performance of machine learning models. Our results show that K-CGAN can be applied to different datasets with different characteristics, making it a valuable tool for data scientists and practitioners in various fields.

https://eprints.bournemouth.ac.uk/39677/

Source: BURO EPrints