| DZIEKAN i RADA WYDZIAŁU
INFORMATYKI, ELEKTRONIKI I TELEKOMUNIKACJI
AKADEMII GÓRNICZO-HUTNICZEJ im. ST. STASZICA W KRAKOWIE
| zapraszają na
publiczą dyskusję nad rozprawą doktorską
mgr inż. Piotra Żelasko
|CHALLENGES IN SPEECH RECOGNITION INDUSTRY: DATA COLLECTION, TEXT NORMALIZATION AND PUNCTUATION MODELLING|
| Dyskusja odbędzie się 30 maja 2019 roku o godz. 15:00 w sali 1.36
ul. Kawiory 21, pawilon D-17
|PROMOTOR: Dr hab. inż. Bartosz Ziółko, Akademia Górniczo-Hutnicza im. St. Staszica w Krakowie|
|RECENZENCI: Prof. dr hab. Zygmunt Vetulani, Uniwersytet im. Adama Mickiewicza w Poznaniu|
|Dr hab. inż. Artur Janicki, Politechnika Warszawska|
| Z rozprawą doktorską i opiniami recenzentów można się zapoznać
w Czytelni Biblioteki Głównej AGH, al. Mickiewicza 30
mgr inż. Piotr Żelasko
Promotor: dr hab. inż. Bartosz Ziółko (AGH)
This thesis investigates several approaches to building an automatic speech recognition system with an application-oriented focus. Three major hypotheses are being investigated.
The first one is that a careful design of an automated annotated recording collection process provides superior data for acoustic model training compared to existing Polish corpora. The second relates to text normalization for language model preparation and states that a substantial number of abbreviations in strongly inflected languages can be expanded to their full, morphologically correct forms with an application of a recurrent neural network model which predicts based only on the morphosyntactic features of a sentence. The last main point is that punctuation can be restored in transcripts of conversational speech by means of deep neural network models and word timing features, where the model processes both sides of the conversation at once.
The research starts with an overview of datasets available for Polish speech recognizer development and outlines the problem of limited data availability. An initial approach is suggested where expert rules are incorporated into the system, but due limited efficiency, the research steers towards finding a method to produce high-quality acoustic dataset. A voice over IP recording tool, capable of automatic annotation is then described. Recordings obtained in this part of research were used to train the acoustic model in a commercial speech recognition system.
Another part of research investigates the text-related aspects of automatic speech recognition system development. It briefly describes the specific text normalization problems encountered in strongly inflected languages and explains why this problem is important for speech recognition. Then a method for automatically expanding abbreviations is suggested and evaluated on two publicly available Polish text corpora.
Finally, the last part of the research is focused on designing a punctuation prediction model for conversational speech. To that end, a novel application of Needleman-Wunsch algorithm is proposed to create a training and evaluation dataset using the English Fisher corpus. Two punctuation prediction models are investigated, one based on a convolutional neural network and the other on a bidirectional recurrent neural network. Both models achieve competitive results and have been successfully implemented as part of a commercial system.