Hybrid models of speech recognition combine a neural acoustic model with a language model, which rescores the output of the acoustic model to find the most linguistically likely transcript. Consequently, the language model is of key importance in both open and domain-specific speech recognition and much work has been done in adapting the language model to the speech input.

This paper presents an efficient pipeline for hybrid speech recognition where a domain-specific language model is selected for each utterance based on the result of domain classification. Experiments on public speech recognition datasets in the Vietnamese language show improvements in accuracy over the baseline speech recognition model for little increase in running time.


  • Dang Hoang Vu – FPT Technology Research Institute, Hanoi, Vietnam
  • Van Huy Nguyen – FPT Technology Research Institute, Hanoi, Vietnam; Thai Nguyen University of Technology, Thai Nguyen, Vietnam
  • Phuong Le-Hong – FPT Technology Research Institute, Hanoi, Vietnam; College of Science, Vietnam National University, Hanoi, Vietnam

See more HERE.

Related posts: