Skip to main navigation Skip to search Skip to main content

Hybrid Deep Machine Learning Feature Selection for High-Dimensional Cybersecurity Data

  • North Carolina Agricultural and Technical State University
  • Industrial and systems engineering with North Carolina A&T State University

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The rapid increase in cyber threats has heightened the demand for Intrusion Detection Systems (IDS) that are both accurate and efficient. While deep learning models outperform traditional machine learning models in identifying complex attack patterns, their effectiveness is often constrained by high-dimensional feature spaces, reduced interpretability, and increased computational cost. To address these, we propose a novel IDS framework: Hybrid Deep Machine Learning Feature Selection (HDMLFS), which leverages Integrated Gradients (IG) and SHapley Additive exPlanations (SHAP) sensitivity to feature perturbations and global consistency in feature importance, enabling a more robust selection process with high performance. First, a correlation-based algorithm removes redundant features by analyzing the upper triangular part of the correlation matrix and discarding the less informative feature from each highly correlated pair. Next, a voting-based algorithm combines IG and SHAP rankings to identify the most informative features, ensuring that at least half of the features are retained while maximizing relevance. The framework was evaluated using the NSL-KDD and CSE-CIC IDS2018 datasets, reducing the feature space by 48% and 65%, respectively. Models trained with the selected features demonstrated superior performance, with ResNet-SF achieving the best results: 98.23% weighted accuracy on CSE-CIC IDS2018, including 86.96% recall for rare Web attacks, and 99.77% accuracy on NSL-KDD, including 80.77% recall for the rare U2R attack. These results highlight the effectiveness of HDMLFS in improving detection capability while reducing complexity and supporting efficient and interpretable IDS solutions.
Original languageEnglish
Pages (from-to)172136-172156
Number of pages21
JournalIEEE Access
Volume13
DOIs
StatePublished - Jan 1 2025

Keywords

  • CNN
  • Deep learning
  • ResNet
  • SHAP
  • feature selection
  • hybrid machine learning
  • integrated gradient

Fingerprint

Dive into the research topics of 'Hybrid Deep Machine Learning Feature Selection for High-Dimensional Cybersecurity Data'. Together they form a unique fingerprint.

Cite this