Exploring discrepancies in findings obtained with the KDD Cup '99 data set
Authors: Engen, V., Vincent, J. and Phalp, K.
Journal: Intelligent Data Analysis
Volume: 15
Issue: 2
Pages: 251-276
eISSN: 1571-4128
ISSN: 1088-467X
DOI: 10.3233/IDA-2010-0466
Abstract:The KDD Cup '99 data set has been widely used to evaluate intrusion detection prototypes, most based on machine learning techniques, for nearly a decade. The data set served well in the KDD Cup '99 competition to demonstrate that machine learning can be useful in intrusion detection systems. However, there are discrepancies in the findings reported in the literature. Further, some researchers have published criticisms of the data (and the DARPA data from which the KDD Cup '99 data has been derived), questioning the validity of results obtained with this data. Despite the criticisms, researchers continue to use the data due to a lack of better publicly available alternatives. Hence, it is important to identify the value of the data set and the findings from the extensive body of research based on it, which has largely been ignored by the existing critiques. This paper reports on an empirical investigation, demonstrating the impact of several methodological differences in the publicly available subsets, which uncovers several underlying causes of the discrepancy in the results reported in the literature. These findings allow us to better interpret the current body of research, and inform recommendations for future use of the data set. © 2011 - IOS Press and the authors. All rights reserved.
Source: Scopus
Exploring Discrepancies in Findings Obtained with the KDD Cup '99 Data Set
Authors: Engen, V., Vincent, J. and Phalp, K.T.
Journal: Intelligent Data Analysis
ISSN: 1088-467X
Abstract:The KDD Cup '99 data set has been widely used to evaluate intrusion detection prototypes, most based on machine learning techniques, for nearly a decade. The data set served well in the KDD Cup '99 competition to demonstrate that machine learning can be useful in intrusion detection systems.
However, there are discrepancies in the ndings reported in the literature. Further, some researchers have published criticisms of the data (and the DARPA data from which the KDD Cup '99 data has been derived), questioning the validity of results obtained with this data. Despite the criticisms, researchers continue to use the data due to a lack of better publicly available alternatives. Hence, it is important to identify the value of the data set and the ndings from the extensive body of research based on it, which has largely been ignored by the existing critiques. This paper reports on an empirical investigation, demonstrating the impact of several methodological dierences in the publicly available subsets, which uncovers several underlying causes of the discrepancy in the results reported in the literature. These ndings allow us to better interpret the current body of research, and inform recommendations for future use of the data set.
K
Source: Manual
Preferred by: Keith Phalp