Datasets for RF Fingerprinting of Bit-similar USRP X310 Radios
Download Datasets: :
Please use below links to download the datasets:Dataset#1: Raw IQ samples of over-the-air transmissions from 16 X310 USRP radios
Dataset#2: Demodulated IQ symbols of over-the-cable transmissions with 16 configurations of IQ imbalances
Note : Both datasets are stored as 64 bit floating point instead of 32 bit as specified in metadata files. Please use dtype=np.complex128 instead of dtype=np.complex64 while parsing binary data.
These datasets were used for the paper "ORACLE: Optimized Radio clAssification through Convolutional neuraL nEtworks" published in INFOCOM 2019. Please use this link to download the paper. Any use of this dataset which results in an academic publication or other publication which includes a bibliography should include a citation to our paper. Here is the reference for the work:
Conference version: PDF
K. Sankhe, M. Belgiovine,F. Zhou, S. Riyaz, S. Ioannidis, and K. R. Chowdhury, "ORACLE: Optimized Radio clAssification through Convolutional neuraL nEtworks,” IEEE INFOCOM 2019, Paris, France, May. 2019.
Extended version: PDF
K. Sankhe, M. Belgiovine,F. Zhou, L. Angioloni, F. Restuccia, S. D’Oro,
T. Melodia, S. Ioannidis, and K. R. Chowdhury, "No Radio Left Behind: Radio Fingerprinting Through Deep Learning of Physical-Layer Hardware Impairments,” IEEE Transactions on Cognitive Communications and Networking, Special Issue on Evolution of Cognitive Radio to AI-enabled Radio and Networks, 2019.
Description:
Our proposed RF fingerprinting approach 'ORACLE' detects a unique radio from a large pool of bit-similar devices (same hardware, protocol, physical address, MAC ID) using only IQ samples at the physical layer. ORACLE follows two approaches; 1) it trains a convolutional neural network (CNN) to detect hardware-centric unique signatures (e.g. IQ imbalance, DC offsets, etc.) embedded in the transmitter radio chain; and 2) it uses a receiver-feedback to inject modifications in the transmitter chain to perform channel-independent RF fingerprinting. ORACLE achieves 99% classification accuracy for a 16-node USRP X310 SDR testbed and an external database of >100 COTS WiFi devices. To evaluate the performance of ORACLE's deep-learning model, we have created two standard datasets. These datasets can be used by fellow researchers to reproduce the original work or to further explore other machine learning problems in the domain of wireless communication.Fig. 1: Deep-learning, such as Convolutional neural network (CNN) to detect unique transmitter-signatures.
Fig. 2: Use of receiver-feeback to inject impairments (e.g., IQ imbalance, DC offsets) to increase differentiability among radios.
Experimental Setup
ORACLE trains CNN using IQ samples collected from an experimental setup of USRP SDRs, as shown in Fig. 3, with a fixed USRP B210 as the receiver. All transmitters are bit-similar USRP X310 radios that emit IEEE 802.11a standards-compliant frames generated via a MATLAB WLAN System toolbox. The data frames generated contain random payload but have the same address fields, and are then streamed to the selected SDR for over-the-air wireless transmission. The receiver SDR samples the incoming signals at 5 MS/s sampling rate at the center frequency of 2.45 GHz for WiFi. Overall, we collect over 20 million samples for each radio. We conduct the experiments in a more open area which has fewer reflections as shown in Fig.4. The transmitter-receiver separation distance is increased from 2 ft to 62 ft with an interval of 6 ft.Fig. 3: Experimental setup for data collection using SDR
Fig. 4: Experimental environment: open area with much less reflections
Dataset Description:
We are releasing two datasets a) Dataset #1 : recordings of raw IQ samples collected from over-the-air transmissions of 16 USRP X310 transmitter radios ; b) Dataset #2: recordings of demodulated IQ symbols collected after equalizing over-the-cable transmissions of 16 IQ imbalance configurations. In both the datasets, each recording consists of two files: a metadata file and a dataset file. The dataset file is a binary file of digital samples, and the metadata file contains information that describes the dataset. Our metadata and data format is an extension of, and compatible with the SigMF specifications .- Dataset #1 : It consists of recordings of collected raw IQ samples from 16, high-end X310 USRP SDRs with the same B210 radio as a receiver. The recordings are categorized into different folders with folder name "xxft", where xx represents the transmitter-receiver separation distance in feet. Each recording has a dataset file with an extenstion of '.sigmf-data' , and a metadata file with an extension of '.sigmf-meta'. These files are named in a specific format for more intuitive understanding.
For example, the dataset file "WiFi_air_X310_3123D7B_2ft_run1" represents
- WiFi : --> IEEE802.11a standard-compliant WLAN frame
- air :--> medium of transmission
- X310 :--> the type of USRP radio
- 3123D7B : --> device serial ID
- 2ft: --> the transmitter-receiver separation distance in feet
- run1 : --> the recording number
- sigmf-data/sigmf-meta : --> the extension of dataset file/metadata file
- Dataset #2: It consists of recordings of demodulated IQ symbols obtained after equalizing over-the-cable transmission from X310 USRP SDR transmitter and B210 radio as a receiver. To obtain each recording, we use set_iq_balance function in GRC to set a complex correction factor to the transmit chain of the RF daughterboard that intentionally introduces required level of impairments in the radio. Due to intentional IQ imbalance, the demodulated symbols acquire device- and channel- invariant unique characteristics as shown in Fig. 5. This makes the CNN robust to channel changes, i.e., it makes the transmitter hardware dominate channel induced variations.
Fig. 5. : Patterns generated by 3 impairments on 2 devices under 2 channel conditions. First and second row show the channel- and device- invariance of the patterns respectively.
Similar to Dataset#1, each recording has a dataset file with an extenstion of '.sigmf-data' , and a metadata file with an extension of '.sigmf-meta'. These files are named in a specific format for more intuitive understanding.
For example, the dataset file "Demod_WiFi_cable_X310_3123D76_IQ#1_run1" represents
- Demod_WiFi : --> Demodulated IQ symbols obtained after equalizating raw IQ samples of IEEE802.11a standard-compliant WLAN frame
- air :--> medium of transmission
- X310 :--> the type of USRP radio
- 3123D7B : --> device serial ID
- IQ#1 : --> IQ imbalance configuration number that introduces a specific level of IQ imbalance in the radio
- run1 : --> the recording number
- sigmf-data/sigmf-meta : --> the extension of dataset file/metadata file
SigMF Description:
Global ObjectThe global object consists of name/value pairs that provide information applicable to the entire dataset. It contains the information that is minimally necessary to open and parse the dataset file, as well as general information about the recording itself. The following names are specified in the core namespace:
name | required | type | description |
---|---|---|---|
datatype | true | string | The format of the stored samples in the dataset file. |
sample_rate | true | double | The sample rate of the signal in samples per second. |
version | true | string | The version of the SigMF specification used to create the metadata file. |
sha512 | false | string | The SHA512 hash of the dataset file associated with the SigMF file. |
description | false | string | A text description of the SigMF recording. |
hw | false | string | A text description of the hardware used to make the recording. |
recorder | false | string | The name of the software used to make this SigMF recording. |
author | false | string | The author's name |
Snapshot
global": {
"core:sha512": "b3ff6b996da344e35762f962893e69a9172367bb1e020bfadf2b245adaad9c2146853ce9657f2c7d619b61d63191fbc1741f481f1ed5d67ee7ddeea0029e9d51",
"core:version": "0.0.1",
"core:author": "Kunal Sankhe",
"core:sample_rate": 5000000.0,
"core:description": "SigMF IQ samples recording of demodulated data derived from over-the-cable WiFi transmissions collected by a fixed USRP B210 as a receiver. The transmitter emitted IEEE 802.11a standards compliant frames generated via a MATLAB WLAN System toolbox. Using UHD software, a controlled level of IQ imbalance is introduced at the runtime such that the demodulated symbols acquire unique characteristics.",
"core:datatype": "cf32"}
}
"core:sha512": "b3ff6b996da344e35762f962893e69a9172367bb1e020bfadf2b245adaad9c2146853ce9657f2c7d619b61d63191fbc1741f481f1ed5d67ee7ddeea0029e9d51",
"core:version": "0.0.1",
"core:author": "Kunal Sankhe",
"core:sample_rate": 5000000.0,
"core:description": "SigMF IQ samples recording of demodulated data derived from over-the-cable WiFi transmissions collected by a fixed USRP B210 as a receiver. The transmitter emitted IEEE 802.11a standards compliant frames generated via a MATLAB WLAN System toolbox. Using UHD software, a controlled level of IQ imbalance is introduced at the runtime such that the demodulated symbols acquire unique characteristics.",
"core:datatype": "cf32"}
}
Captures
As per the SigMF specifications, the captures value is an array of capture segment objects that describe the parameters of the signal capture. It MUST be sorted by the value of each capture segment's core:sample_start key, ascending. The following names are specified in the core namespace:
name | required | type | description |
---|---|---|---|
sample_start | true | uint | The sample index in the dataset file at which this segment takes effect. |
global_index | false | double | The center frequency of the signal in Hz. |
datetime | false | string | An ISO-8601 string indicating the timestamp of the sample index specified by sample_start |
Annotations
According to the SigMF specifications, the Annotations value is an array of annotation segment objects that describe anything regarding the signal data not part of the global and captures objects. Each SigMF annotation segment object must contain a core:sample_start name/value pair, which indicates the first index at which the rest of the segment's name/value pairs apply. We have extended the Annotations with genesys namespace
name | required | type | unit | description |
---|---|---|---|---|
environment | true | double | N/A | A description of the environment where antenna is mounted. E.g. "indoor" or "outdoor". |
transmitter_identification | false | object | N/A | Transmitter identification parameters. See Transmitter Object definition. |
receiver_identification | false | object | N/A | Receiver identification parameters. See Transmitter Object definition. |
distance | false | string | feet | Distance between transmitter and receiver |
Transmitter Object
The Transmitter object contains the following name/value pairs:
name | required | type | unit | description |
---|---|---|---|---|
model | true | string | N/A | Make and model of the transmitter. E.g., "Ettus N210", "Ettus B200", "Keysight N6841A", "Tektronix B206B". |
serial_number | false | string | N/A | Globally unique identifier |
low_frequency | false | float | Hz | Low frequency of operational range of the receiver. |
high_frequency | false | float | Hz | High frequency of operational range of the receiver. |
noise_figure | false | float | dB | Noise figure of the receiver. |
max_power | false | float | dBm | Maximum input power of the receiver. |
antenna | true | object | N/A | See Antenna Object definition. |
Receiver Object
The Receiver object contains the following name/value pairs:
name | required | type | unit | description |
---|---|---|---|---|
model | true | string | N/A | Make and model of the receiver. E.g., "Ettus N210", "Ettus B200", "Keysight N6841A", "Tektronix B206B". |
serial_number | false | string | N/A | Globally unique identifier |
low_frequency | false | float | Hz | Low frequency of operational range of the receiver. |
high_frequency | false | float | Hz | High frequency of operational range of the receiver. |
noise_figure | false | float | dB | Noise figure of the receiver. |
max_power | false | float | dBm | Maximum input power of the receiver. |
antenna | true | object | N/A | See Antenna Object definition. |
Antenna object
The Antenna object contains the following name/value pairs:
name | required | type | unit | description |
---|---|---|---|---|
model | true | string | N/A | Antenna make and model number. E.g. "ARA CSB-16", "L-com HG3512UP-NF". |
type | false | string | N/A | Antenna type. E.g. "dipole", "biconical", "monopole", "conical monopole". |
low_frequency | false | float | Hz | Low frequency of operational range. |
high_frequency | false | float | Hz | High frequency of operational range. |