This week, Graz, Austria hosts the 20th Annual Conference of the International Speech Communication Association (Interspeech 2019), one of the world‘s most extensive conferences on the research and engineering for spoken language processing. Over 2,000 experts in speech-related research fields gather to take part in oral presentations and poster sessions and to collaborate with streamed events across the globe.

As a Gold Sponsor of Interspeech 2019, Google is excited to present 30 research publications, and demonstrate some of the impact speech technology has made in our products, from accessible, automatic video captioning to a more robust, reliable Google Assistant. You can also learn more about Google research being presented at Interspeech 2019 below.

Accepted Publications

1. Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data

Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen

2. Multi-Microphone Adaptive Noise Cancellation for Robust Hotword Detection

Yiteng Huang, Turaj Shabestary, Alexander Gruenstein, Li Wan

3. Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model

Ye Jia, Ron Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu

4. Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale

Hanna Mazzawi, Javier Gonzalvo, Aleks Kracun, Prashant Sridhar, Niranjan Subrahmanya, Ignacio Lopez Moreno, Hyun Jin Park, Patrick Violette

5. Shallow-Fusion End-to-End Contextual Biasing

Ding Zhao, Tara Sainath, David Rybach, Pat Rondon, Deepti Bhatia, Bo Li, Ruoming Pang

6. VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif Saurous, Ron Weiss, Ye Jia, Ignacio Lopez Moreno

7. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin Dogus Cubuk, Quoc Le

8. Two-Pass End-to-End Speech Recognition

Ruoming Pang, Tara Sainath, David Rybach, Yanzhang He, Rohit Prabhavalkar, Mirko Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu

9. On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition

Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen

10. Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition

Jack Serrino, Leonid Velikovich, Petar Aleksic, Cyril Allauzen

11. Joint Speech Recognition and Speaker Diarization via Sequence Transduction

Laurent El Shafey, Hagen Soltau, Izhak Shafran

12. Personalizing ASR for Dysarthric and Accented Speech with Limited Data

Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias

13. An Investigation Into On-Device Personalization of End-to-End Automatic Speech Recognition Models

Khe Chai Sim, Petr Zadrazil, Francoise Beaufays

14. Salient Speech Representations Based on Cloned Networks

Bastiaan Kleijn, Felicia Lim, Michael Chinen, Jan Skoglund

15. Cross-Lingual Consistency of Phonological Features: An Empirical Study

Cibu Johny, Alexander Gutkin, Martin Jansche

16. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

Heiga Zen, Viet Dang, Robert Clark, Yu Zhang, Ron Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu

17. Improving Performance of End-to-End ASR on Numeric Sequences

Cal Peyser, Hao Zhang, Tara Sainath, Zelin Wu

18. Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages

Harry Bleyan, Sandy Ritchie, Jonas Fromseier Mortensen, Daan van Esch

19. Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models

Ke Hu, Antoine Bruguier, Tara Sainath, Rohit Prabhavalkar, Golan Pundak

20. Fréchet Audio Distance: A Reference-free Metric for Evaluating Music Enhancement Algorithms

Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi

21. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

Yu Zhang, Ron Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran

22. Sampling from Stochastic Finite Automata with Applications to CTC Decoding

Martin Jansche, Alexander Gutkin

23. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Anjuli Kannan, Arindrima Datta, Tara Sainath, Eugene Weinstein, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, SeungJi Lee

24. A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

Jean-Marc Valin, Jan Skoglund

25. Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition

David Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharif

26. Unified Verbalization for Speech Recognition & Synthesis Across Languages

Sandy Ritchie, Richard Sproat, Kyle Gorman, Daan van Esch, Christian Schallhart, Nikos Bampounis, Benoit Brard, Jonas Mortensen, Amelia Holt, Eoin Mahon

27. Better Morphology Prediction for Better Speech Systems

Dravyansh Sharma, Melissa Wilson, Antoine Bruguier

28. Dual Encoder Classifier Models as Constraints in Neural Text Normalization

Ajda Gokcen, Hao Zhang, Richard Sproat

29. Large-Scale Visual Speech Recognition

Brendan Shillingford, Yannis Assael, Matthew Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas

30. Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

Fadi Biadsy, Ron Weiss, Pedro Moreno, Dimitri Kanevsky, Ye Jia

Source: Google Research Communications

Related posts: