SBAA490A December   2021  – April 2022 PCM6120-Q1 , TAA5212 , TAC5111 , TAC5112 , TAC5211 , TAC5212 , TLV320ADC5120 , TLV320ADC6120

 

  1.   Trademarks
  2. 1Introduction
  3. 2Voice Activity Detector
    1. 2.1 VAD Configurations
    2. 2.2 VAD Parameters
  4. 3VAD Results
  5. 4Examples
  6. 5Related Documentation
  7. 6Revision History

VAD Results

This section discusses the VAD results. The algorithm performance is given by a ROC curve which describes the detection performance across different operating thresholds (–12 dB to –3 dB). ROC plots are included for the noise scenarios from the Aurora Noise database (Figure 3-1 Car , Figure 3-2 restaurant and Figure 3-3 Subway) and speech signals from the NOIZEUS Speech database. Test vectors are generated by mixing noise and speech signals at the desired SNR (SNR is the separation between the power levels of speech and noise signals) of 12, 18, and 24 dB (for example, 12-dB SNR means noise power level is 12 dB down from the speech power level). The operating point is at the extreme top left for the 12-dB threshold, and moves towards the right as the threshold is increased, indicating better performance at Figure 3-4 and the –7-dB threshold for both speech hit rate and non-speech hit rate.

GUID-20211207-SS0I-XFTJ-384T-5ZLTZ9XTQBT1-low.gif Figure 3-1 Non-Speech Hit Rate vs Speech Hit Rate for Car Noise
GUID-20211207-SS0I-Z43F-2CKW-PNKMRTMRMZMS-low.gif Figure 3-2 Non-Speech Hit Rate vs Speech Hit Rate for Restaurant Noise
GUID-20211207-SS0I-VT2K-LWJN-8B4TWCSFHPBZ-low.gif Figure 3-3 Non-Speech Hit Rate vs Speech Hit Rate for Subway Noise

After analyzing the collected data, the –7-dB threshold was chosen to give the best speech hit rate and non-speech hit rate across different noise types. ROC curve at –7-dB threshold for different noise types is as shown.

GUID-20211207-SS0I-JHDK-8QWP-S4FGVXDBBXBX-low.gif Figure 3-4 Non-Speech Hit Rate vs Speech Hit Rate at –7 dB Threshold for 12 dB SNR