Comparative analysis of Turkish and Hungarian end-to-end speech recognition
The training and implementation of an end-to-end deep neural network-based speech recognition system do not need any language specific knowledge – apart from the pure speech and text data. However, such speech recognition system shows different behavior for English and Hungarian, for example, that can be explained by the quite different morphological structure of these languages. Turkish and Hungarian, in turn, are believed to share similar morphological features, therefor their comparison in end-to-end automatic speech recognition may reveal new insights. The primary task is to conduct parallel speech recognition experiments on Turkish and Hungarian, using training and test data similar in terms of length and nature, and then evaluate and analyze them comparatively.
Tasks to be performed by the student will include:
Overview end-to-end automatic speech recognition results of Turkish and Hungarian.
Study technologies applicable for end-to-end deep neural network-based speech recognition with an emphasis on neural structures applicable for self-supervised pre-training.
Design speech recognition experiments to obtain baselines for Turkish and Hungarian.
Create end-to-end neural net based automatic speech recognition models. Optimize the hyper-parameters.
Perform a comparative analysis on the Turkish and Hungarian models and results. Report word and character error rates, real-time factor for the inference and memory requirements.