Skip to main navigation Skip to search Skip to main content

Robust analysis of multivariate count data under stopped–sampling designs: the contaminated Dirichlet–Negative multinomial

Research output: Contribution to journalArticlepeer-review

Abstract

Multivariate count data under stopped-sampling schemes arise in ecology, genomics, and related fields. The Dirichlet-Negative Multinomial (DNM) is a natural baseline for such data, but can be brittle: a small fraction of atypical samples can distort the shared Dirichlet mixer, biasing dispersion and degrading predictive fit. We propose the Contaminated Dirichlet-Negative Multinomial (CDNM), a robust extension that introduces a same-mode contamination mechanism at the Dirichlet layer. The construction preserves the modal composition while inflating only the dispersion of contaminants, yielding a two-component mixture with exact likelihoods and transparent interpretation. Estimation is likelihood-based via an expectation-maximization algorithm with closed-form scores and Hessians for the DNM kernel and constrained updates for the contamination parameters. Posterior responsibilities provide calibrated observation-level probabilities of contamination, enabling principled diagnostics without ad hoc trimming. Simulation studies show that the CDNM reduces bias and RMSE in dispersion estimation relative to the DNM across contamination regimes, while coinciding with the DNM for clean data. An application to microbiome counts yields improved information criteria and predictive likelihood, a sharper estimate of typical precision, and stable modal composition. The CDNM thus unifies interpretability, tractability, and robustness for stopped-sampling counts, providing a practical tool for inference and anomaly detection in multivariate count analysis.
Original languageEnglish
Pages (from-to)1--17
JournalJournal of Applied Statistics
DOIs
StateAccepted/In press - Jan 1 2026

Keywords

  • Dirichlet-Negative multinomial
  • Multivariate count data
  • contamination modeling
  • expectation-maximization algorithm
  • robust estimation
  • stopped-sampling schemes

Fingerprint

Dive into the research topics of 'Robust analysis of multivariate count data under stopped–sampling designs: the contaminated Dirichlet–Negative multinomial'. Together they form a unique fingerprint.

Cite this