Dual supervised learning for non-native speech recognition

Kacper Radzikowski , Robert Marek Nowak , Le Wang , Osamu Yoshie


Current automatic speech recognition (ASR) systems achieve over 90–95% accuracy, depending on the methodologyapplied and datasets used. However, the level of accuracy decreases significantly when the same ASR system is usedby a non-native speaker of the language to be recognized. At the same time, the volume of labeled datasets ofnon-native speech samples is extremely limited both in size and in the number of existing languages. This problemmakes it difficult to train or build sufficiently accurate ASR systems targeted at non-native speakers, which,consequently, calls for a different approach that would make use of vast amounts of large unlabeled datasets. In thispaper, we address this issue by employing dual supervised learning (DSL) and reinforcement learning with policygradient methodology. We tested DSL in a warm-start approach, with two models trained beforehand, and in a semiwarm-start approach with only one of the two models pre-trained. The experiments were conducted on Englishlanguage pronounced by Japanese and Polish speakers. The results of our experiments show that creating ASRsystems with DSL can achieve an accuracy comparable to traditional methods, while simultaneously making use ofunlabeled data, which obviously is much cheaper to obtain and comes in larger sizes.
Author Kacper Radzikowski (FEIT / IN)
Kacper Radzikowski,,
- The Institute of Computer Science
, Robert Marek Nowak (FEIT / IN)
Robert Marek Nowak,,
- The Institute of Computer Science
, Le Wang
Le Wang,,
, Osamu Yoshie
Osamu Yoshie,,
Journal seriesEURASIP Journal on Audio Speech and Music Processing, ISSN 1687-4722, [1687-4714], (N/A 100 pkt)
Issue year2019
Publication size in sheets0.5
Keywords in EnglishSpeech recognition, Dual supervised learning, Reinforcement learning, Policy gradients, Non-nativespeaker, Machine learning, Deep learning, Artificial intelligence
ASJC Classification2208 Electrical and Electronic Engineering; 3102 Acoustics and Ultrasonics
URL https://rdcu.be/bgUxy
Languageen angielski
014-eurasip2019dual.pdf 1.33 MB
Score (nominal)100
Score sourcejournalList
ScoreMinisterial score = 100.0, 20-10-2019, ArticleFromJournal
Publication indicators WoS Citations = 0; Scopus SNIP (Source Normalised Impact per Paper): 2016 = 0.783; WoS Impact Factor: 2017 = 3.057 (2) - 2017=1.863 (5)
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?