DroneAudioSet - Augmented Human Lab

Abstract

Unmanned Aerial Vehicles (UAVs) or drones, are increasingly used in search and rescue missions to detect human presence. Existing systems primarily leverage vision-based methods which are prone to fail under low-visibility or occlusion. Drone-based audio perception offers promise but suffers from extreme ego-noise that masks sounds indicating human presence. Existing datasets are either limited in diversity or synthetic, lacking real acoustic interactions, and there are no standardized setups for drone audition. To this end, we present DroneAudioset, a comprehensive drone audition dataset featuring 23.5 hours of annotated recordings, covering a wide range of signal-to-noise ratios (SNRs) from -60 dB to 0 dB, across various drone types, throttles, microphone configurations as well as environments. The dataset enables development and systematic evaluation of noise suppression and classification methods for human-presence detection under challenging conditions, while also informing practical design considerations for drone audition systems, such as microphone placement trade-offs, and development of drone noise-aware audio processing. This dataset is an important step towards enabling design and deployment of drone-audition systems.

Figure (a) illustrates our experimental setup with the drone attached to a fixed aluminum frame, with two microphone arrays, M-up and M-down, and a single microphone, M-center. The source sounds (i.e., human vocal sounds, human presence sounds and ambient sounds) are transmitted through a speaker. (b) the actual setup -- the drone frame, microphone array, and the drones used.

Examples

SNR	Original Recording (Drone+Source)	Traditional (BeamformingMVDR +SpectralGating)	Neural (MPSENET)	Hybrid (BeamformingMVDR +MPSENET)	Clean Recording (No Drone)
Class: HV (Male Speech/Scream)
-3.4 dB (SNR group: >-10 dB)	90db-speaker-volume-room1-drone1-speaker-dist-1m-mic-dist-50cm-throttle-50-mic3_8array-up
-3.4 dB (SNR group: >-10 dB)
-21.5 dB (SNR group: <-10 dB)	90db-speaker-volume-room2-drone1-speaker-dist-3m-mic-dist-25cm-throttle-100-mic3_8array-up
-21.5 dB (SNR group: <-10 dB)
Class: HNV (Human non-vocal sounds)
-3.4 dB (SNR group: >-10 dB)	90db-speaker-volume-room1-drone1-speaker-dist-1m-mic-dist-50cm-throttle-50-mic3_8array-up
-3.4 dB (SNR group: >-10 dB)
-21.5 dB (SNR group: <-10 dB)	90db-speaker-volume-room2-drone1-speaker-dist-3m-mic-dist-25cm-throttle-100-mic3_8array-up
-21.5 dB (SNR group: <-10 dB)
Class: NH (Non-Human sounds)
-3.4 dB (SNR group: >-10 dB)	90db-speaker-volume-room1-drone1-speaker-dist-1m-mic-dist-50cm-throttle-50-mic3_8array-up
-3.4 dB (SNR group: >-10 dB)
-21.5 dB (SNR group: <-10 dB)	90db-speaker-volume-room2-drone1-speaker-dist-3m-mic-dist-25cm-throttle-100-mic3_8array-up
-21.5 dB (SNR group: <-10 dB)

Downloads

Code

GitHub Repository

Dataset

Download DroneAudioset (42.6 GB)

Contact

For questions and comments, please contact:

chitralekha[at]ahlab[dot]org; soundarya[at]ahlab[dot]org