Abstract
Multivariate count data under stopped-sampling schemes arise in ecology, genomics, and related fields. The Dirichlet-Negative Multinomial (DNM) is a natural baseline for such data, but can be brittle: a small fraction of atypical samples can distort the shared Dirichlet mixer, biasing dispersion and degrading predictive fit. We propose the Contaminated Dirichlet-Negative Multinomial (CDNM), a robust extension that introduces a same-mode contamination mechanism at the Dirichlet layer. The construction preserves the modal composition while inflating only the dispersion of contaminants, yielding a two-component mixture with exact likelihoods and transparent interpretation. Estimation is likelihood-based via an expectation-maximization algorithm with closed-form scores and Hessians for the DNM kernel and constrained updates for the contamination parameters. Posterior responsibilities provide calibrated observation-level probabilities of contamination, enabling principled diagnostics without ad hoc trimming. Simulation studies show that the CDNM reduces bias and RMSE in dispersion estimation relative to the DNM across contamination regimes, while coinciding with the DNM for clean data. An application to microbiome counts yields improved information criteria and predictive likelihood, a sharper estimate of typical precision, and stable modal composition. The CDNM thus unifies interpretability, tractability, and robustness for stopped-sampling counts, providing a practical tool for inference and anomaly detection in multivariate count analysis.
| Original language | English |
|---|---|
| Pages (from-to) | 1--17 |
| Journal | Journal of Applied Statistics |
| DOIs | |
| State | Accepted/In press - Jan 1 2026 |
Keywords
- Dirichlet-Negative multinomial
- Multivariate count data
- contamination modeling
- expectation-maximization algorithm
- robust estimation
- stopped-sampling schemes
Fingerprint
Dive into the research topics of 'Robust analysis of multivariate count data under stopped–sampling designs: the contaminated Dirichlet–Negative multinomial'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver