Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
Pawel Cyrta
,
Tomasz Trzciński
,
Wojciech Stokowiec
Abstract
In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted spectral features, we propose to train for this purpose a recurrent convolutional neural network applied directly on magnitude spectrograms. To compare our approach with the state of the art, we collect and release for the public an additional dataset of over 6 h of fully annotated broadcast material. The results of our evaluation on the new dataset and three other benchmark datasets show that our proposed method significantly outperforms the competitors and reduces diarization error rate by a large margin of over 30% with respect to the baseline.Author |
Pawel Cyrta - [Tooploox]
Pawel Cyrta,,
-
-
,
Tomasz Trzciński (FEIT / IN)
Tomasz Trzciński,,
- The Institute of Computer Science
,
Wojciech Stokowiec - [Tooploox]
Wojciech Stokowiec,,
-
-
|
Pages | 107-117 |
Publication size in sheets | 0.5 |
Book |
Borzemski Leszek, Świątek Jerzy, Wilimowska Zofia (eds.): Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017. Part I, Advances in Intelligent Systems and Computing, vol. 655, 2018, Springer International Publishing, ISBN 978-3-319-67219-9, [978-3-319-67220-5], 358 p., DOI:10.1007/978-3-319-67220-5
|
Keywords in English | Speaker diarization, Speaker embeddings, Speaker clustering, Deep neural network, Recursive convolutional neural networks, Convolutional neural networks |
DOI | DOI:10.1007/978-3-319-67220-5_10 |
URL |
https://link.springer.com/chapter/10.1007/978-3-319-67220-5_10 |
Project | Development of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław,
, Phone: +48 22 234 7432, start date 01-08-2018, end date 30-09-2019, II/2018/DS/1, Completed
WEiTI
Działalność statutowa
|
Language | en angielski |
Score (nominal) | 20 |
Score source | publisherList |
Score | Ministerial score = 20.0, 20-10-2019, ChapterFromConference |
Publication indicators |
WoS Citations = 1;
Scopus Citations = 2 |
Citation count* | |
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back