EUSIPCO'2002 - Actes du colloque

> Home > Paper #674

Paper data Title: Posteriors correction using feedback synthesis loop in robust ASR Author(s): Glotin Herve, erss-cnrs Page numbers in the proceedings: Volume III pp 603-606 Session: Language and Speech Recognition Paper abstract Current Automatic Speech Recognition (ASR) systems are not efficient under noisy speech. We propose a new strategy to reinforce ASR robustness, based on a feedback loop from recognition posteriors to signal synthesis. The key idea is to use phonemes posteriors generated by recognition to calculate at each frame an acoustic image (AI) and to calculate its correlation with the input signal. AI is the weighted sum of phoneme clean spectrum. Where weights are directly taken as the corresponding phonemes' posteriors. Correlation between AI and the input spectrum gives a Recognition Index (RI). We then show how a simple correction function of posteriors' distribution using RI improves the Word Error Rate in a continuous speech recognition task compared to a state of the art ASR system (Jrasta). Paper A PDF version is available here
[ Programme \| Find by author \| Find by keyword \| Find by paper code \| Contribution list \| About Eusipco'2002 \| Help \| Eusipco'2004 \| Home page]

Title:
Posteriors correction using feedback synthesis loop in robust ASR

Author(s):
Glotin Herve, erss-cnrs

Page numbers in the proceedings:
Volume III pp 603-606

Session:
Language and Speech Recognition

Current Automatic Speech Recognition (ASR) systems are not efficient under noisy speech. We propose a new strategy to reinforce ASR robustness, based on a feedback loop from recognition posteriors to signal synthesis. The key idea is to use phonemes posteriors generated by recognition to calculate at each frame an acoustic image (AI) and to calculate its correlation with the input signal. AI is the weighted sum of phoneme clean spectrum. Where weights are directly taken as the corresponding phonemes' posteriors. Correlation between AI and the input spectrum gives a Recognition Index (RI). We then show how a simple correction function of posteriors' distribution using RI improves the Word Error Rate in a continuous speech recognition task compared to a state of the art ASR system (Jrasta).