TY - JOUR
T1 - Secure cyber defense
T2 - An analysis of network intrusion-based dataset ccd-idsv1 with machine learning and deep learning models
AU - Thapa, Niraj
AU - Liu, Zhipeng
AU - Shaver, Addison
AU - Esterline, Albert
AU - Gokaraju, Balakrishna
AU - Roy, Kaushik
N1 - Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/8/1
Y1 - 2021/8/1
N2 - Anomaly detection and multi-attack classification are major concerns for cyber defense. Several publicly available datasets have been used extensively for the evaluation of Intrusion Detection Systems (IDSs). However, most of the publicly available datasets may not contain attack scenarios based on evolving threats. The development of a robust network intrusion dataset is vital for network threat analysis and mitigation. Proactive IDSs are required to tackle ever-growing threats in cyberspace. Machine learning (ML) and deep learning (DL) models have been deployed recently to detect the various types of cyber-attacks. However, current IDSs struggle to attain both a high detection rate and a low false alarm rate. To address these issues, we first develop a Center for Cyber Defense (CCD)-IDSv1 labeled flow-based dataset in an OpenStack environment. Five different attacks with normal usage imitating real-life usage are implemented. The number of network features is increased to overcome the shortcomings of the previous network flow-based datasets such as CIDDS and CIC-IDS2017. Secondly, this paper presents a comparative analysis on the effectiveness of different ML and DL models on our CCD-IDSv1 dataset. In this study, we consider both cyber anomaly detection and multi-attack classification. To improve the performance, we developed two DL-based ensemble models: Ensemble-CNN-10 and Ensemble-CNN-LSTM. Ensemble-CNN-10 combines 10 CNN models developed from 10-fold cross-validation, whereas Ensemble-CNN-LSTM combines base CNN and LSTM models. This paper also presents feature importance for both anomaly detection and multi-attack classification. Overall, the proposed ensemble models performed well in both the 10-fold cross-validation and independent testing on our dataset. Together, these results suggest the robustness and effectiveness of the proposed IDSs based on ML and DL models on the CCD-IDSv1 intrusion detection dataset.
AB - Anomaly detection and multi-attack classification are major concerns for cyber defense. Several publicly available datasets have been used extensively for the evaluation of Intrusion Detection Systems (IDSs). However, most of the publicly available datasets may not contain attack scenarios based on evolving threats. The development of a robust network intrusion dataset is vital for network threat analysis and mitigation. Proactive IDSs are required to tackle ever-growing threats in cyberspace. Machine learning (ML) and deep learning (DL) models have been deployed recently to detect the various types of cyber-attacks. However, current IDSs struggle to attain both a high detection rate and a low false alarm rate. To address these issues, we first develop a Center for Cyber Defense (CCD)-IDSv1 labeled flow-based dataset in an OpenStack environment. Five different attacks with normal usage imitating real-life usage are implemented. The number of network features is increased to overcome the shortcomings of the previous network flow-based datasets such as CIDDS and CIC-IDS2017. Secondly, this paper presents a comparative analysis on the effectiveness of different ML and DL models on our CCD-IDSv1 dataset. In this study, we consider both cyber anomaly detection and multi-attack classification. To improve the performance, we developed two DL-based ensemble models: Ensemble-CNN-10 and Ensemble-CNN-LSTM. Ensemble-CNN-10 combines 10 CNN models developed from 10-fold cross-validation, whereas Ensemble-CNN-LSTM combines base CNN and LSTM models. This paper also presents feature importance for both anomaly detection and multi-attack classification. Overall, the proposed ensemble models performed well in both the 10-fold cross-validation and independent testing on our dataset. Together, these results suggest the robustness and effectiveness of the proposed IDSs based on ML and DL models on the CCD-IDSv1 intrusion detection dataset.
KW - CART
KW - CCD-IDSv1
KW - CNN
KW - Deep learning
KW - Ensemble
KW - Intrusion detection system
KW - KNN
KW - LSTM
KW - Machine learning
KW - RF
KW - XGBoost
UR - https://www.scopus.com/pages/publications/85110526230
U2 - 10.3390/electronics10151747
DO - 10.3390/electronics10151747
M3 - Article
SN - 2079-9292
VL - 10
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 15
M1 - 1747
ER -