Dual supervised learning for non-native speech recognition
Kacper Radzikowski , Robert Marek Nowak , Le Wang , Osamu Yoshie
AbstractCurrent automatic speech recognition (ASR) systems achieve over 90–95% accuracy, depending on the methodologyapplied and datasets used. However, the level of accuracy decreases significantly when the same ASR system is usedby a non-native speaker of the language to be recognized. At the same time, the volume of labeled datasets ofnon-native speech samples is extremely limited both in size and in the number of existing languages. This problemmakes it difficult to train or build sufficiently accurate ASR systems targeted at non-native speakers, which,consequently, calls for a different approach that would make use of vast amounts of large unlabeled datasets. In thispaper, we address this issue by employing dual supervised learning (DSL) and reinforcement learning with policygradient methodology. We tested DSL in a warm-start approach, with two models trained beforehand, and in a semiwarm-start approach with only one of the two models pre-trained. The experiments were conducted on Englishlanguage pronounced by Japanese and Polish speakers. The results of our experiments show that creating ASRsystems with DSL can achieve an accuracy comparable to traditional methods, while simultaneously making use ofunlabeled data, which obviously is much cheaper to obtain and comes in larger sizes.
|Journal series||EURASIP Journal on Audio Speech and Music Processing, ISSN 1687-4722, [1687-4714], (N/A 100 pkt)|
|Publication size in sheets||0.5|
|Keywords in English||Speech recognition, Dual supervised learning, Reinforcement learning, Policy gradients, Non-nativespeaker, Machine learning, Deep learning, Artificial intelligence|
|Score||= 100.0, 20-10-2019, ArticleFromJournal|
|Publication indicators||= 0; : 2016 = 0.783; : 2017 = 3.057 (2) - 2017=1.863 (5)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.