A comparative study for throat microphone speech enhancement with different approaches

Md. Easir Arafat 1, 2, *, Indraneel Misra 1, 2 and Md. Ekramul Hamid 3

1 Department of Computer Science & Engineering, Bangladesh.
2 Pundra University of Science & Technology, Bogura, Bangladesh.
3 University of Rajshahi, Rajshahi, Bangladesh.
 
Research Article
International Journal of Science and Research Archive, 2024, 13(01), 850–859.
Article DOI: 10.30574/ijsra.2024.13.1.1631
Publication history: 
Received on 23 July 2024; revised on 16 September 2024; accepted on 18 September 2024
 
Abstract: 
Throat microphones (TM) offer significant advantages in noisy environments by capturing speech signals directly from the throat, thus minimizing external noise. However, TM signals often lack clarity and intelligibility compared to conventional microphones. This paper presents a comparative study of three prominent feature extraction techniques—Mel-frequency cepstral coefficients (MFCC), Linear Predictive Cepstral coefficients (LPCC), Perceptual Linear Prediction (PLP) for enhancing speech captured by throat microphones. Each technique is evaluated based on its ability to enhance speech clarity and reduce noise interference. Experimental results on the ATR503 dataset, consisting of throat and close-talk microphone recordings, reveal that LPCC achieved an average Signal-to-Noise Ratio (SNR) improvement of 3dB and a Perceptual Evaluation of Speech Quality (PESQ) score increase of 1.3133 and 0.9553 compared to MFCC and PLP. In subjective evaluations the highest mean rating of 8.46 for LPCC indicates it was perceived as the most intelligible and clear. LPC spectra analysis demonstrates that Linear Predictive Cepstral Coefficients (LPCC) in retrieving missing frequencies in speech captured by throat microphones. These findings suggest that LPCC is a robust method for throat microphone speech enhancement, offering significant improvements in speech intelligibility and quality in noisy environments.
 
Keywords: 
Throat Microphone (TM); Mel-frequency cepstral coefficients (MFCC); Linear Predictive Cepstral coefficients (LPCC); Perceptual Linear Prediction (PLP); LPC Spectra; Perceptual Evaluation of Speech Quality (PESQ); Signal-to-Noise Ratio (SNR)
 
Full text article in PDF: