Skip to main navigation Skip to search Skip to main content

A robust adversarial ensemble with causal (feature interaction) interpretations for image classification

  • Chunheng Zhao
  • , Pierluigi Pisu
  • , Gurcan Comert
  • , Negash Begashaw
  • , Varghese Vaidyan
  • , Nina Hubig
  • Clemson University College of Engineering, Computing and Applied Sciences
  • Industrial and systems engineering with North Carolina A&T State University
  • Benedict College
  • Dakota State University
  • IT:U Interdisciplinary Transformation University Austria

Research output: Contribution to journalArticlepeer-review

Abstract

Deep learning-based discriminative classifiers, despite their remarkable success, remain vulnerable to adversarial examples that can mislead model predictions. While adversarial training can enhance robustness, it fails to address the intrinsic vulnerability stemming from the opaque nature of these black-box models. In this paper, we present a deep ensemble model that combines discriminative features with generative models to achieve both high classification accuracy and strong adversarial robustness. Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions through a deep latent variable model. Using variational Bayes, our model achieves superior robustness against diverse white-box adversarial attacks without requiring adversarial training. Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate our model’s superior adversarial robustness. Through evaluations using counterfactual metrics and feature interaction-based metrics, we establish correlations between model interpretability and adversarial robustness. Our architecture’s generative component is generalizable and can serve as an auxiliary network adaptable to various pre-trained discriminative models. We demonstrate this generalizability through experiments on Tiny-ImageNet with different backbone architectures, indicating the potential applicability of our approach to larger-scale classification datasets.
Original languageEnglish
Article number291
JournalMachine Learning
Volume114
Issue number12
DOIs
StatePublished - Dec 1 2025

Keywords

  • Adversarial attacks
  • Causal learning
  • Generative classifier
  • Image classification

Fingerprint

Dive into the research topics of 'A robust adversarial ensemble with causal (feature interaction) interpretations for image classification'. Together they form a unique fingerprint.

Cite this