Skip to main navigation Skip to search Skip to main content

Few-Shot Retrieval-Augmented LLMs for Anomaly Detection in Network Traffic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Accurate anomaly detection is essential for protecting heterogeneous multi-environment (M-En) networks, where traditional enterprise traffic coexists with Internet of Things (IoT) flows. Existing machine learning-based anomaly detection approaches rely on extensive labeled data and frequent retraining, and therefore adapt poorly to unseen attack patterns and few-data settings. In this work, a Retrieval-Augmented Generation (RAG) pipeline is introduced in which Large Language Models (LLMs) are guided to classify malicious traffic under few-shot settings. Structured packet-level and statistical features are first rendered into natural-language prompts; dense sentence embeddings are then produced and indexed with FAISS (Facebook AI Similarity Search, an approximate-nearest-neighbour vector index) to construct a vector-based knowledge base. At inference time, the embedding of an unseen packet is used to retrieve a specified number of semantically similar, labeled examples, which are supplied as in-context examples to the LLM. This workflow enables learning over heterogeneous traffic without model fine-tuning, improves generalization, and yields human-readable explanations of each decision. When paired with MPNet embeddings and a context of 200 retrieved examples, the 4B-parameter Gemma3:4b model reaches 1.0 accuracy, while the 7B Mistral model achieves 0.98. Inference takes 23.51 s with Gemma3:4b and 66.7 s with Mistral for 50 samples, both running on a single NVIDIA GeForce RTX 3090 GPU.
Original languageEnglish
Title of host publicationUnknown book
PublisherSpringer Science and Business Media Deutschland GmbH
DOIs
StatePublished - 2026

Fingerprint

Dive into the research topics of 'Few-Shot Retrieval-Augmented LLMs for Anomaly Detection in Network Traffic'. Together they form a unique fingerprint.

Cite this