Non-standard Waveforms from Hovering Unmanned Aerial Vehicles (UAVs) Dataset
Download Datasets:
Please use the following link to download the 4.5 GB dataset containing non-standard signals from 7 hovering DJI M100 UAVs flying inside an RF anechoic chamber:Dataset: UAV-M100-Hovering
Note : The dataset is released in SigMF format and must be parsed from binary to float16 before being used.
This dataset is created for the task of RF fingerprinting hovering UAVs. They were used in "RF Fingerprinting Unmanned Aerial Vehicles with Non-standard Transmitter Waveforms," Accepted in IEEE Transactions on Vehicular Technology, 2020. Any use of this dataset that results in any kind of publication with a bibliography section, should include a citation to our paper. Here is the PDF and the reference for the paper:
Paper PDF
Problem:
RF fingerprinting relies on identifying discriminating transmitter-generated features at the receiver. These features include artifacts such as nonlinearities in the power amplifier gain, I/Q phase imbalance, clock and frequency offsets, etc., mainly arising from slight variations in the operating points of the electronic components. While RF fingerprinting using deep learning has shown to be very successful for static devices, to the best of our knowledge, there are no works on applying this technique for classifying identical hovering UAVs. We note that this is different from the well-investigated problem of UAV type detection, where the objective is to distinguish between different make/models. Since the wireless transmitters, typically WiFi interface cards, are from different providers, fingerprinting these cards (and hence identifying the UAV) reduces to a simpler problem than classifying UAVs of the "same" make/model. Furthermore, the constant UAV hovering introduces complex channel variations between the transmitter UAV and the ground-based receiver, which needs to be carefully studied. Our previous experimental studies show that the standard deviation in position around the target location can be as high as 0.85 m for DJI M100 UAV using on-board GPS modules. In the domain of RF fingerprinting UAVs, the problem of hovering-induced channel variations is previously ameliorated by equalizing the WiFi signals. However, in our case where we use M100 UAVs transmitting their non-standard proprietary waveforms, equalization is not an option. In other words, we can only rely on raw I/Q samples for RF fingerprinting the hovering UAVs. To tackle the problem of channel variations manifesting in the signals, we use a multi-classifier scheme. The overview of our multi-classifier method is shown in Figure 1. More information about our method can be found in the paper.
Dataset collection description:
In an RF anechoic chamber, we collect signals from 7 identical DJI M100 UAVs as transmitters. An Ettus USRP X310 equipped with an UBX 160 USRP daughterboard is used to capture signals at the receiver-side. We fly the UAVs one at a time at different distances of {6, 9, 12, 15}ft from the receiver, while they transmit. The receiver collects I/Q samples only in the downlink 10MHz channel where the UAV is transmitting. At each distance, we collect I/Q samples for ~2 seconds, pause for ~10 seconds, and then repeat this process 3 more times. The ~10-second intervals of time partition the overall received signals into 4 non-overlapping bursts, each containing ~140 interleaved short periods of data and noise. A high-level overview of the sequence collected at the receiver side for each UAV at a given distance is shown in Figure 2.To complete the dataset, the procedure the sequence above is collected from all the 7 UAVs, flying at 4 different distances from the receiver. To prepare the sequences for our deep learning framework, we extract the portions containing data and separate them from interleaved noise periods to form ~140 sequences per burst. From here on, we refer to these non-overlapping data sequences as "examples". With 7 UAVs, each having 4 distances, each distance having 4 bursts, and each burst having ~140 examples, we have more than 13k examples with average length of ~92k I/Q samples in the dataset.
Dataset folder content:
The dataset contains ~13k examples (transmissions - separated from noise periods) in total. In the SigMF format, data sequences are represented in Binary format in .bin files. In our datasets, each .bin file is a flat representation of transmission in the form of interleaved I and Q values. Our data type is float 16, and hence, each I or Q value takes 2 bytes as shown in Figure 3.Each .bin file is accompanied by a .json file that contains the meta-data for that .bin file. In what follows, we describe the naming convention and the details of our meta-data files.
Naming convention:
Each file is named with 4 different parameters in the form of A_B_C_D.bin or A_B_C_D.json.- A is the UAV number and can be one of the values of {‘uav1’, ‘uav2’, …, ‘uav7’}.
- B is the distance between transmitter and receiver for that transmission. It can be one of the values of {‘6ft’, ’9ft’, ‘12ft’, ‘15ft’}.
- C is the burst number. It can be one of the values of {‘burst1’, ‘burst2’, ‘burst3’, ‘burst4’}.
- D is the transmission number, starting from 1. In each burst, we have a number of around 140 transmissions. Transmission number indicated the temporal order of the transmissions within a burst.
Meta Data Description:
As mentioned before, for each transmission recorded in the .bin file, there is a .json file with the same name as the .bin file. The .json file contains meta-data for that specific transmission. Inside the .json file, we have a set of key/value pairs that we will describe below:- global:
- version: version of SigMF
- sample_rate: sample rate of signal in Hz
- total_transmissions: total number of transmissions in this dataset
- description: a short description of what this file is
- record_date: date that the currnet dataset was created
- datatype: type of data either float16, float32, etc.
- captures:
- sample_start: the index in the file where the samples start
- center_frequency: center frequency in Hz
- transmission_number: the index of this transmission, it is a number between 0 and total_transmissions-1
- annotations:
- distance: the distance between transmitter and receiver for this transmission
- sample_count: number of samples
- protocol: transmission protocol used in this transmission
- environment: the environment where the dataset is collected
- transmitter:
- device_id_genesys_lab: transmitter device ID in the genesys lab
- antenna: antenna model
- UAV: UAV name associated with the paper
- make_and_model: transmitter UAV make and model
- receiver:
- radio: receiver device model
- antenna_model: antenna model of the receiver
- antenna_make: antenna make of the receiver
- daughter_board: daughter board of the receiver radio
Snapshot of an example meta-data file from UAV dataset:
"core:version": "0.0.1",
"core:sample_rate": 10000000,
"core:total_transmissions": 13893,
"core:description": "This is the meta file for a specific transmission in the UAV dataset",
"core:record_date": "March 11, 2020",
"core:datatype": "cf16_le"
},
"core:sample_start": 0,
"core:center_frequency": 2406500000,
"core:transmission_number": 14
},
"core:distance": "6ft",
"core:sample_count": 97928,
"core:protocol": "Lightbridge",
"core:environment": "RF anechoic chamber",
"transmitter":{
"core:device_id_genesys_lab": "m1005",
"core:antenna": "M100 proprietary",
"core:UAV": "uav2",
"core:make_and_model": "DJI M100"
},
"receiver": {
"core:radio": "Ettus USRP X310",
"core:antenna_model": "3181 Broadband Mini-Bicon",
"core:antenna_make": "ETS-Lindgren",
"core:daughter_board": "UBX 160 USRP"
},
},