A machine learning approach to dataset imputation for software vulnerabilities

Authors: Rostami, S., Kleszcz, A., Dimanov, D. and Katos, V.

Journal: Communications in Computer and Information Science

Volume: 1284 CCIS

Pages: 25-36

eISSN: 1865-0937

ISBN: 9783030589998

ISSN: 1865-0929

DOI: 10.1007/978-3-030-59000-0_3

Abstract:

This paper proposes a supervised machine learning approach for the imputation of missing categorical values in a dataset where the majority of samples are incomplete. Twelve models have been designed that can predict nine of the twelve Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) tactic categories using only the Common Attack Pattern Enumeration and Classification (CAPEC). The proposed method has been evaluated on a test dataset consisting of 867 unseen samples, with the classification accuracy ranging from 99.88% to 100%. These models were employed to generate a more complete dataset with no missing ATT&CK tactic features.

https://eprints.bournemouth.ac.uk/34258/

Source: Scopus

A Machine Learning Approach to Dataset Imputation for Software Vulnerabilities

Authors: Rostami, S., Kleszcz, A., Dimanov, D. and Katos, V.

Conference: Multimedia Communications, Services & Security (MCSS'20)

Dates: 8 October-9 July 2020

Journal: Springer

https://eprints.bournemouth.ac.uk/34258/

Source: Manual

A Machine Learning Approach to Dataset Imputation for Software Vulnerabilities

Authors: Katos, V.

Conference: MCSS'20: 10th international Conference on Multimedia Communications, Services & Security

Abstract:

This paper proposes a supervised machine learning approach for the imputation of missing categorical values from the majority of samples in a dataset. Twelve models have been designed that are able to predict nine of the twelve ATT&CK tactic categories using only one feature, namely the Common Attack Pattern Enumeration and Classification (CAPEC). The proposed method has been evaluated on a 867 sample unseen test set with classification accuracy in the range of 99.88%- 100%. Using these models, a more complete dataset has been generated with no missing values for the ATT&CK tactic feature.

https://eprints.bournemouth.ac.uk/34258/

Source: BURO EPrints