An efficient first pass of a two-stage approach for automatic language identification of telephone speech
Automatic language identification, recognizing a speaker's language from a speech signal, is gaining increased importance in the context of economic globalization. The most accurate systems for such identification use multiple, large vocabulary continuous speech recognizers. But their scalability is limited, as each new language added requires complete recognizers and an enormous amount of training. For increased efficiency, we propose a Two-Stage approach: an efficient clustering algorithm (least cost) first selects the top candidates at an accuracy level of over 80%. Then, speech recognizers (high cost) would be used to narrow the field from approximately three candidates to correctly identify the language. ^ In this research, we describe the first phase of this Two-Stage process in which we are able to narrow the list of possible languages through an extremely scalable and flexible system. We cover how test and reference patterns (acoustic feature vectors) are extracted from speech utterances, how cepstral coefficients are used, and how reference models are generated from the reference patterns using Vector Quantization (VQ) clustering algorithm. Various distance measures are also examined in the selection phase to find the best method. We show using the top-N strategy in this first stage leads to substantial improvements over existing systems in discriminating between different languages, and our experiments showed the top three choices yield 87.2% in a 5-language task, 88.6% in a 7-language task, and the top five yield 87.5% in a 10-language task. The second iteration in the I0-language task easily narrows it down to three choices with 80% probability. Further research could improve the entire process, but combining this particular methodology with current best practices has proven an extremely efficient and cost effective way to address the challenges of automatic language identification. ^
Jonathan K Law,
"An efficient first pass of a two-stage approach for automatic language identification of telephone speech"
(January 1, 2002).
ETD Collection for Pace University.