Effective Feature Engineering and Classification of Breast Cancer Diagnosis: A Comparative Study

Authors: Strelcenia, E. and Prakoonwit, S.

Journal: BioMedInformatics

Volume: 3

Issue: 3

Pages: 616-631

eISSN: 2673-7426

DOI: 10.3390/biomedinformatics3030042

Abstract:

Breast cancer is among the most common cancers found in women, causing cancer-related deaths and making it a severe public health issue. Early prediction of breast cancer can increase the chances of survival and promote early medical treatment. Moreover, the accurate classification of benign cases can prevent cancer patients from undergoing unnecessary treatments. Therefore, the accurate and early diagnosis of breast cancer and the classification into benign or malignant classes are much-needed research topics. This paper presents an effective feature engineering method to extract and modify features from data and the effects on different classifiers using the Wisconsin Breast Cancer Diagnosis Dataset. We then use the feature to compare six popular machine-learning models for classification. The models compared were Logistic Regression, Random Forest, Decision Tree, K-Neighbors, Multi-Layer Perception (MLP), and XGBoost. The results showed that the Decision Tree model, when applied to the proposed feature engineering, was the best performing, achieving an average accuracy of 98.64%.

https://eprints.bournemouth.ac.uk/38841/

Source: Scopus

Effective Feature Engineering and Classification of Breast Cancer Diagnosis: A Comparative Study

Authors: Strelcenia, E. and Prakoonwit, S.

Journal: BioMedInformatics

Volume: 3

Issue: Feature Papers in Computational Biology and Medicine

DOI: 10.3390/biomedinformatics3030042

https://eprints.bournemouth.ac.uk/38841/

Source: Manual

Effective Feature Engineering and Classification of Breast Cancer Diagnosis: A Comparative Study

Authors: Strelcenia, E. and Prakoonwit, S.

Journal: BioMedInformatics

Volume: 3

Issue: 3

Pages: 616-631

ISSN: 2673-7426

Abstract:

Breast cancer is among the most common cancers found in women, causing cancer-related deaths and making it a severe public health issue. Early prediction of breast cancer can increase the chances of survival and promote early medical treatment. Moreover, the accurate classification of benign cases can prevent cancer patients from undergoing unnecessary treatments. Therefore, the accurate and early diagnosis of breast cancer and the classification into benign or malignant classes are much-needed research topics. This paper presents an effective feature engineering method to extract and modify features from data and the effects on different classifiers using the Wisconsin Breast Cancer Diagnosis Dataset. We then use the feature to compare six popular machine-learning models for classification. The models compared were Logistic Regression, Random Forest, Decision Tree, K-Neighbors, Multi-Layer Perception (MLP), and XGBoost. The results showed that the Decision Tree model, when applied to the proposed feature engineering, was the best performing, achieving an average accuracy of 98.64%.

https://eprints.bournemouth.ac.uk/38841/

Source: BURO EPrints