Research Impact of data augmentation in broadband spectrograms

data augmentation
Fig. 1. Examples of data augmentation applied to the synthetic data sample with source depth $z_s$=5 m, CPA range $r_{\textrm{CPA}}$=100 m, and speed $v_\textrm{ship}$=1 kn using seabed \#1: (a) raw sample from the training data with no data augmentation included; (b) dropout; (c) uniform noise; (d) time stretching; (e) loudness augmentation by adding 10 dB to the spectrogram; (f) vertical flipping; (g) time warping; (h) time mask; (i) frequency mask; and (j) A combination of (g)-(h).

Abstract

Two residual networks are implemented to perform regression for source localization and environment classification using a moving mid-frequency source recorded during the Seabed Characterization Experiment in 2017. The first model implements only classification for inferring the seabed type, and the second model uses regression to estimate the source localization parameters. The training is done using synthetic data generated by the ORCA normal mode model. The architectures are tested on both measured field and simulated data with variations in the sound speed profile and seabed mismatch. Additionally, nine data augmentation techniques are implemented to study their effect on the network predictions. The metrics used to quantify network performance are the root mean square error for regression, and accuracy for seabed classification. The models report consistent results for the source localization estimation and accuracy above 65\% in the worst-case scenario for seabed classification. From the data augmentation study, results show that the more complex transformations such as time warping, time masking, frequency masking, and a combination of these techniques, yield significant improvement of the results using both simulated and measured data.

ResNet-18 for source localization and seabed classification tasks. The input of the ResNet-18 architecture consists of a spectrogram of size $1\times80\times1107$, after downsampling and smoothing the raw spectrograms. The output of the network corresponds to the seabed types (for only classification) and the source localization parameters (for only regression). The target variables are the source depth $z_s$, the CPA range $r_{\textrm{CPA}}$, the ship speed $v_\textrm{{ship}}$, and the seabed type or sediment.

residual network
Fig. 2. ResNet-18 architecture to classify seabeds and predict source location using towed tonal spectrograms. The colors in the figure represent the distinct basic blocks in the architecture. Each curved line in the figure represents a residual connection. Two different models are implemented with minor changes in the output layer depending on whether classification or regression is used.

Results of networks trained with and without data augmentation applied to synthetic tests A-E. RMSE is calculated for each test set and the five instances of the 3-reg model for source depth, CPA range, and ship speed. Seabed accuracy from the 4-class architecture. The confidence intervals shown at the top of each bar represent the standard deviation $(\sigma)$ of the errors and accuracy, respectively.

data augmentation results
Fig. 3. Performance of ResNet algorithms with and without data augmentation. The bars show a comparison of error and accuracy between the trained models without data augmentation (\textit{none} in the figure), and the nine different transformation used. The results are presented for (a) source depth $z_s$, (b) CPA range $r_{\textrm{CPA}}$, (c) ship speed $v_{\textrm{ship}}$ and (d) seabed type. Results in panels (a), (b) and (c) are from ResNet trained for only regression outputs, whereas panel (d) contains the results for ResNet trained for only classification.

Multiple data transformations have been applied to different testing sets to study the influence of such changes on the accuracy of the models. Out of the nine augmentations implemented during training, the loudness transformation impacted results negatively. Dropout, uniform noise, and time warping augmentations did not contribute to significant improvements for the network predictions. In contrast, time stretching, flipping, time masking, frequency masking, and the combined transformation positively impacted the ability of the trained networks to generalize to synthetic datasets with environmental mismatch and to the measured data samples. The improvement in the predictions showed data augmentation helps deep networks to focus only on the most relevant features in the data during training. Results demonstrated these data transformations are a reliable regularization technique to improve ResNet performance for seabed classification and source localization using mid-frequency spectrograms, outperforming predictions obtained without data augmentation.


Related publications


Related conferences


Related projects