LABORATORIO DE COMUNICACIÓN ORAL
ROBERT WAYNE NEWCOMB
R. Martínez, A. Alvarez, P. Gómez, M. Pérez, V. Nieto and V. Rodellar
- "A Speech Pre-processing Technique for End-Point Detection in Highly Non-stationary
- Rhodes, Greece, 22-25 September, 1997.
- The determination of the precise moment in which speech begins or ends is an
important problem in ASR. As showed in , small separations from the optimum
beginning and ending point, imply a great decrease in the recognition accuracy.
The presence of noise  , specially when its level is high (around 95 dB as
in the case of this work), and its characteristics are highly non-stationary, is
an added problem, since it can produce false shots (more probable when the noise
includes speech sounds). That is the reason why in such conditions, it is important
to have a pre-processing stage that removes as much noise as is possible, and that
gives some clues that help to build an end-point detector for those environments.
The method here presented offers a pre-processing technique for highly noisy and non
stationary environments, which at the same time that enhances the speech, gives
an equalised version of the SNR improvement (Mean Spectral Energy Difference), whose
main characteristic is that large differences in the level of noise are changed to
a little ripple, while the presence of speech is distinguished by a large decrease
in this Mean Spectral Energy Difference. Following this technique, any End-point
Detection approach (explicit, implicit or hybrid ) may render acceptable results.
Pulse aquí para bajarse el artículo
Formato PostScript (664 Kb.)
Comprimido en zip (144 Kb.)
Madrid a 9 de junio de 2004