Exploring discrepancies in findings obtained with the KDD Cup '99 data set

Authors: Engen, V., Vincent, J. and Phalp, K.

Journal: Intelligent Data Analysis

Volume: 15

Issue: 2

Pages: 251-276

eISSN: 1571-4128

ISSN: 1088-467X

DOI: 10.3233/IDA-2010-0466

Abstract:

The KDD Cup '99 data set has been widely used to evaluate intrusion detection prototypes, most based on machine learning techniques, for nearly a decade. The data set served well in the KDD Cup '99 competition to demonstrate that machine learning can be useful in intrusion detection systems. However, there are discrepancies in the findings reported in the literature. Further, some researchers have published criticisms of the data (and the DARPA data from which the KDD Cup '99 data has been derived), questioning the validity of results obtained with this data. Despite the criticisms, researchers continue to use the data due to a lack of better publicly available alternatives. Hence, it is important to identify the value of the data set and the findings from the extensive body of research based on it, which has largely been ignored by the existing critiques. This paper reports on an empirical investigation, demonstrating the impact of several methodological differences in the publicly available subsets, which uncovers several underlying causes of the discrepancy in the results reported in the literature. These findings allow us to better interpret the current body of research, and inform recommendations for future use of the data set. © 2011 - IOS Press and the authors. All rights reserved.

Source: Scopus

Exploring Discrepancies in Findings Obtained with the KDD Cup '99 Data Set

Authors: Engen, V., Vincent, J. and Phalp, K.T.

Journal: Intelligent Data Analysis

ISSN: 1088-467X

Abstract:

The KDD Cup '99 data set has been widely used to evaluate intrusion detection prototypes, most based on machine learning techniques, for nearly a decade. The data set served well in the KDD Cup '99 competition to demonstrate that machine learning can be useful in intrusion detection systems.

However, there are discrepancies in the ndings reported in the literature. Further, some researchers have published criticisms of the data (and the DARPA data from which the KDD Cup '99 data has been derived), questioning the validity of results obtained with this data. Despite the criticisms, researchers continue to use the data due to a lack of better publicly available alternatives. Hence, it is important to identify the value of the data set and the ndings from the extensive body of research based on it, which has largely been ignored by the existing critiques. This paper reports on an empirical investigation, demonstrating the impact of several methodological dierences in the publicly available subsets, which uncovers several underlying causes of the discrepancy in the results reported in the literature. These ndings allow us to better interpret the current body of research, and inform recommendations for future use of the data set.

K

Source: Manual

Preferred by: Keith Phalp

Exploring discrepancies in findings obtained with the KDD Cup '99 data set.

Authors: Engen, V., Vincent, J. and Phalp, K.

Journal: Intell. Data Anal.

Volume: 15

Pages: 251-276

DOI: 10.3233/IDA-2010-0466

Source: DBLP