May 9, 2025 Current Resources

Applying Autoencoder-Based GNNs for High-Throughput Network Anomaly Detection in NetFlow Data

As modern enterprise and cloud environments scale, the complexity and volume of network traffic increase dramatically. NetFlow is used to record metadata about the traffic flows traversing a network device such as a router, switch, or host. NetFlow data, essential for understanding network traffic, can be effectively modeled as graphs where edges capture properties such as connection duration and data volume. Nodes, representing hosts, lack specific properties.

The sheer volume of NetFlow data—up to tens of million events per second according to research—makes traditional anomaly detection methods inefficient. Identifying attacks is particularly challenging when analyzing individual connections in isolation, as they often appear normal without broader context. By leveraging the graph structure, it is possible to add topological context, making it easier to identify anomalous patterns.

In this post, we discuss a novel way to apply an autoencoder-based graph neural network (GNN) to detect anomalies in massive NetFlow data.

Traditional anomaly detection solutions may rely on static thresholds or simple feature engineering that fail to adapt to the evolving nature of malicious behavior. Current solutions also may not achieve the inference speed and scalability required to process tens of millions of network flows per second in real time.

Existing GNN-based solutions for anomaly detection include basic graph embedding techniques, conventional GNN-based detectors, and autoencoder architectures applied to network data. However, these methods often have the following issues:

Lack of hierarchical graph structures to capture multi-level patterns.
Unreliable coverage performance because they use only the standard Netflow 5 tuple (source IP, destination IP, duration, source bytes, and destination bytes). They often need much more metadata, such as IP reputations or external threat intelligence. Such metadata is difficult, if not impossible, to acquire in production-scale scenarios.
Simplistic node features that do not fully exploit the IP address space or neighborhood embeddings.
Inability to achieve both high detection accuracy and the extremely high throughput needed for large-scale, real-time analysis.
Limited flexibility and scaling potential, failing to maintain low false-positive rates at high traffic volumes.

GNN: A graph autoencoder

We present a novel GNN-based autoencoder pipeline incorporating a graph autoencoder (GAE) tailored to detect anomalies in massive, dynamic NetFlow graphs.

A sequence diagram shows how graphs for inference and training are generated, starting with flow sequencing, followed by the generation of node and edge features, and finally adding them into a graph object. — *Figure 1. Steps to build a graph*

Figure 1 shows that flows are chunked into sequences, followed by the creation of the graph structure. Each node is assigned a feature vector based on the IP address, while edges are defined by flow properties.

Building a graph

The first step in building the graph involves organizing the data into manageable sequences. Flows are divided into sequences based on a specified graph size, which, for this purpose, is set to 200K flows per sequence. Each sequence of flows undergoes further processing described later in this post to form a single graph that serves as model input.

At inference time, there is no overlap between subsequent batches of flows, so each graph is created independently of the previous graph.

After the flows are chunked, the basic structure of the graph is formed. Each unique IP address within the flows is treated as a node, and each flow between IP addresses forms an edge. This structure creates the skeleton of the graph, which is subsequently encapsulated within a PyTorch geometric graph data object.

Given the edge-centric nature of flows, the next focus is on the nodes.

Each node, representing an IP address, is initially assigned a vector embedding derived from the IP address’s octets. These initial embeddings are refined through an iterative process where each node’s vector embedding is averaged with the embeddings of its neighboring nodes. This averaging continues until the embeddings converge, indicating minimal change between iterations. The final embeddings serve as the features for each node in the graph.

For each flow, corresponding to an edge in the graph, three key properties are defined:

Forward bytes: Number of bytes sent from the source to the destination.
Backward bytes: Bytes sent from the destination back to the source.
Flow duration: Captures the time duration of the flow.

These properties provide essential context for each edge, enriching the graph and enabling the GNN to better understand the interactions between nodes. If multiple connections exist between two IP addresses, each connection creates a distinct directed edge on the graph.

Graph structure

NetFlow data, often massive in scale, presents a significant challenge due to the limited availability of labeled datasets and the time-intensive process of labeling.

In real-world scenarios, network analysts frequently encounter unlabeled data, particularly when striving for real-time anomaly detection. This makes supervised learning methods impractical, as they rely heavily on labeled examples for training.

Unsupervised learning models, by contrast, are crucial for NetFlow anomaly detection because they do not require labeled data and can identify patterns, deviations, or anomalies based on the inherent structure of the data. According to research, such models are well-suited for uncovering unusual patterns or behaviors in network traffic, which often signal security threats or performance issues.

By using techniques such as clustering and density estimation, unsupervised models can efficiently handle the dynamic and evolving nature of NetFlow data, enabling effective detection of anomalies in both known and novel scenarios without the need for extensive manual intervention.

Figure 2 summarizes the general pipeline architecture.

Inference sequence diagram shows where graphs are first embedded to create an embedding per node. Embeddings are then used to reconstruct the adjacency matrix of the input graph, which are then used to compute anomaly scores. — *Figure 2. Model inference architecture*

Flow graphs are first encoded using graph encoder layers, which produce a node embedding vector for each node on the graph based on its topology and features. The node embeddings and edge properties are then used to reconstruct the original connectivity structure of the graph (adjacency matrix) by producing an existence probability for every edge on the graph. Finally, we compute the anomaly score as the complement of the existence probability.

Key innovations in our approach

Graph U-Net integration: We incorporate a Graph U-Net within the encoder to learn hierarchical and multi-resolution embeddings of network traffic patterns, improving the model’s capacity to detect subtle anomalies.
Global edge embeddings in decoding: We enhance the autoencoder’s reconstruction step by introducing edge-level embeddings combined with global edge context. This facilitates more accurate anomaly scoring on edges, which correspond to network flows.
IP-octet–based node feature engineering: Instead of treating IP addresses as arbitrary IDs, we decompose them into octets, normalize them, and combine these features with neighbor-based embeddings. This approach captures meaningful structure and semantics from real-world network addressing, increasing the model’s sensitivity to suspicious activities.
Output anomaly score as a probability: Following the reconstruction of the adjacency matrix, we calculate the probability of an edge’s existence. The anomaly score is subsequently defined as the complement of this probability, reflecting the likelihood of the edge not existing. The architecture of the model enables every distinct edge on the graph to have a different anomaly score that’s informed by its topology on the graph as well as node and edge properties.

Performance metrics and comparison

GAE is able to outperform the current best performing baseline. The current state-of-the-art GNN model on our test dataset is called Anomal-E. Similar to our approach, Anomal-E uses edge features and graph topological structures in a self-supervised manner.

We evaluated both our GAE model and Anomal-E on the same datasets. Our GAE model outperformed Anomal-E in terms of true positive rate (TPR) and false positive rate (FPR) (Table 1).

Dataset	TPR	FPR	Num Total Flows/ Anomalous Flows	Num Classes	Previous Baseline TPR/FPR
NF-CICIDS-2018	87%	15%	8.4M/1.0M	6	88%/29%
NF-UNSW-NB15	98%	2%	1.6M/72k	9	79%/0.2%
NF-ToN-IOT	78%	4%	1.4M/1.1M	9	74%/57%
NF-BoT-IOT	40%	2%	600k/586k	4	46%/60%

Table 1. Comparison of the GAE model with the Anomal-E baseline for TPR and FPR across four benchmark datasets

This improvement has significant implications for real-world anomaly detection tasks, particularly in cybersecurity applications.

A higher TPR indicates that our model can correctly identify a greater proportion of actual anomalies, which is critical for detecting malicious activities such as unauthorized access, insider threats, or network intrusions.

Equally important is the GAE model’s lower FPR, which reduces the number of normal interactions mistakenly flagged as anomalies. This is particularly valuable in practical applications, where false positives can be costly and time-consuming for security teams to investigate. By minimizing false alarms, our model enables analysts to focus their attention on genuine threats, improving operational efficiency and ensuring that resources are allocated to high-priority issues.

The balance between TPR and FPR is crucial in anomaly detection, as optimizing one metric often comes at the expense of the other. A model with a high TPR but a high FPR may overwhelm analysts with excessive false positives, while a model with a low FPR but a low TPR risks missing critical threats.

Our GAE model’s ability to outperform Anomal-E on both metrics demonstrates its effectiveness in achieving a better overall balance, making it more reliable and practical for real-world deployments.

GAE accelerated by NVIDIA Morpheus

One general question regarding the high-throughput network is its computing efficiency. With Morpheus fully integrated, the GAE can offer near-real-time inference throughput (Figure 3).

A bar chart shows the inference throughput comparison of CPU, GPU, and NVIDIA Morpheus pipelines. Metrics show that at a batch size of 2.5 million rows, Morpheus demonstrates 34x throughput improvements compared to a CPU baseline, and is 78% higher than a sequential GPU pipeline. — *Figure 3. Performance comparison of the Morpheus pipeline against GPU Sequential and CPU-only processing in terms of relative throughput across different batch sizes*

In Figure 3, the results show that Morpheus significantly accelerates model performance, achieving much higher throughput compared to CPU-only execution, and outperforming GPU sequential processing across all batch sizes.

When tested on an NVIDIA A100 GPU, for a batch size of 2.5M and 32 batches, the Morpheus pipeline achieves up to 2.5M rows per second in near-real-time throughput. Compared to the GPU-accelerated baseline, the Morpheus pipeline also reduces attacker dwell time by 78%.

Learn more

This approach demonstrates that a GNN-based autoencoder, especially when combined with hierarchical and multi-resolution embeddings via U-Net integration, global edge embeddings, and advanced node feature engineering, can deliver highly accurate and scalable anomaly detection on massive NetFlow datasets. By achieving a strong balance between true positive and false positive rates and leveraging accelerated inference pipelines via NVIDIA Morpheus, this solution addresses the core challenges of real-time, large-scale network security analytics.

Fore more information, see the detailed GNN-based Autoencoder for Netflow Anomaly Detection Using NVIDIA Morpheus example on GitHub.