Reconstructing Classification to Enhance Machine-Learning Based Network Intrusion Detection by Embracing Ambiguity
Communications in Computer and Information Science
Network intrusion detection systems (IDS) has efficiently identified the profiles of normal network activities, extracted intrusion patterns, and constructed generalized models to evaluate (un)known attacks using a wide range of machine learning approaches. In spite of the effectiveness of machine learning-based IDS, it has been still challenging to reduce high false alarms due to data misclassification. In this paper, by using multiple decision mechanisms, we propose a new classification method to identify misclassified data and then to classify them into three different classes, called a malicious, benign, and ambiguous dataset. In other words, the ambiguous dataset contains a majority of the misclassified dataset and is thus the most informative for improving the model and anomaly detection because of the lack of confidence for the data classification in the model. We evaluate our approach with the recent real-world network traffic data, Kyoto2006+ datasets, and show that the ambiguous dataset contains 77.2% of the previously misclassified data. Re-evaluating the ambiguous dataset effectively reduces the false prediction rate with minimal overhead and improves accuracy by 15%.
National Science Foundation
Ensemble classifiers, Machine learning, Network intrusion detection
Chungsik Song, Wenjun Fan, Sang Yoon Chang, and Younghee Park. "Reconstructing Classification to Enhance Machine-Learning Based Network Intrusion Detection by Embracing Ambiguity" Communications in Computer and Information Science (2021): 169-187. https://doi.org/10.1007/978-3-030-72725-3_13