Dataset for Federated Learning for Anomaly Detection in Open RAN: Security Architecture Within a Digital Twin
Download our dataset and code:
Please use the following link to download the datasets: ORAN KPIs for baseline and anomalous trafficPlease use the following link to access the github repository used to generate this dataset: TRACTOR/utils
This dataset was used for the paper "Federated Learning for Anomaly Detection in Open RAN: Security Architecture Within a Digital Twin" published at the EuCNC & 6G Summit, March 2024. Please use this link to download the paper. Any use of this dataset which results in an academic publication or other publication which includes a bibliography should include a citation to our paper. Here is the reference for the work:
Abstract:
The Open Radio Access Network (Open RAN) specifies the evolution of RAN with a disaggregated, open, and intelligent architecture to meet the requirements of next-generation networks. While this provides flexibility and optimization for RAN, it raises new security concerns, potentially increasing vulnerability to cyber threats through disaggregated elements. We introduce a security architecture that functions as a platform to evaluate configurations and train security algorithms within a Network Digital Twin (NDT), which is compliant with the O-RAN architecture defined by the O-RAN Alliance. The elements of the security architecture reside within the NDT and facilitate the training of machine learning (ML) models, which play a pivotal role in O-RAN security operations. To exemplify this framework, we demonstrate a hierarchical Federated Learning (FL) based anomaly detection algorithm that can be applied for three traffic slices in O-RAN. We use Colosseum, an O-RAN-compliant emulation system, to generate time-series data for training. Our trained model is able to detect anomalous traffic and identify traffic slice types with over 99% accuracy.
Proposed security architecture for O-RAN with NDT.
Dataset Description:
We consider an experimental architecture where one or more gNBs (i.e., a combination of RU, DU and CU) with an E2 interface connect to one near-RT RIC. Each gNB is able to support multiple traffic slices. In our experiment, we choose to use three broad 5G slices: enhanced mobile broadband (eMBB), massive machine type communication (mMTC), and ultra reliable low latency communication (URLLC). UEs are assigned to the appropriate traffic slices. The gNB records a wide range of Key Performance Indicators (KPIs) and periodically reports these KPIs to an xApp in the near-RT RIC. For each traffic slice, we generate both normal traffic and attack traffic that comprises anomalies.
Normal Traffic Class (Baseline)
For the normal traffic class, real-world 5G traces were collected in a variety of conditions for each traffic slice (i.e., eMBB, mMTC, URLLC) and stored in the security data collector. The packet arrival rates and payload sizes are based on real user traffic and do not follow a statistical distribution closely. Thus, we consider the UEs to have non-Independent and Identically Distributed (IID) data distribution. These traces are replayed in an O-RAN-compliant emulation environment using Colosseum to generate realistic KPIs. Now, the Colosseum emulator behaves as the basic network model in our NDT. These KPIs are reported per UE basis for the xApp operating in the near-RT RIC every 250 ms. In this way, we generate a robust dataset for the normal traffic class that represents a wide range of real user traffic patterns. More details on the normal traffic class can be found in TRACTOR: Traffic Analysis and Classification Tool for Open RAN.
Anomalous Traffic Class
Directory Structure
To generate the anomalous traffic class, we develop two distinct attack models. The first model focuses on a User Datagram Protocol (UDP) Distributed Denial of Service (DDoS) attack, where an attacker-UE inundates the gNB with a substantial volume of UDP packets, thereby degrading system performance. To create this attack, we initially examine Packet Capture (PCAP) files from the malicious traffic dataset available here. Drawing insights from this traffic analysis, we devise a statistical approach to simulate a UDP DoS attack within our Colosseum-based O-RAN environment. In this simulation, we model the DDoS attack by having each UE generate packets with an arrival rate λ determined by a Poisson distribution and packet sizes based on a Normal distribution. In our experimental scenario, we set \( \lambda = 3.3 \times 10^{-5} \) seconds. For packet sizes, we employ two distinct distributions: \( U_1 \sim N (404, 100) \) and \( U_2 \sim N (1400, 1600) \) in bytes. We term this simulated attack as UDP_Poisson.
The second model introduces a more sophisticated attack variant known as the bandwidth hog attack. This attack represents an attempt to disguise a DDoS attack by closely mimicking realistic packet arrival rates. However, it employs artificially large packet sizes, leading to network congestion. To generate this attack, we utilize the original user traces but increase the payload size by adding \( D = 70 + X \) bytes, where \( X \sim N (30, 100) \). All of the simulated attacks are stored in the security data collector.
The anomalous traffic class is structured as follows. The bandwidth_hog directory contains all the bandwidth hog attack traces. The poisson directory contains two sub-directories: trial 1 contains traces from the \( U_1 \) Poisson attack while trial 2 contains traces from the \( U_2 \) Poisson attack. The udp_flood directory contains the traces from the original UDP attack found here.
Directory | Attack | # of Traces | # of Samples |
---|---|---|---|
/bandwidth_hog | Bandwidth hog | 28 | 109,157 |
/poisson/Trial1 | \( U_1 \) Poisson | 6 | 11,253 |
/poisson/Trial2 | \( U_2 \) Poisson | 4 | 10,260 |
/udp_flood | Original UDP DoS | 4 | 565 |