Audio style transfer in non-native speech recognition

Kacper Radzikowski


Current automatic speech recognition (ASR) systems achieve the over 90-95% accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.
Author Kacper Radzikowski (FEIT / IN)
Kacper Radzikowski,,
- The Institute of Computer Science
Publication size in sheets0.3
Book Romaniuk Ryszard, Linczuk Maciej Grzegorz (eds.): Proceedings of SPIE: Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018, Proceedings of SPIE: The International Society for Optical Engineering, vol. 10808, 2018, SPIE - the International Society for Optics and Photonics, ISBN 9781510622036, 2086 p., DOI:10.1117/12.2504983
Keywords in Englishspeech recognition, style transfer, non-native speaker, machine learning, deep learning, arti�cial intelligence
projectDevelopment of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław, , Phone: +48 22 234 7432, start date 01-08-2018, planned end date 31-12-2018, II/2018/DS/1, Implemented
WEiTI Działalność statutowa
Languageen angielski
1080839_radzikowski.pdf 508.09 KB
Score (nominal)15
ScoreMinisterial score = 15.0, BookChapterSeriesAndMatConf
Ministerial score (2013-2016) = 15.0, BookChapterSeriesAndMatConf
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.