Adaptation of Acoustic and Language Model for Improving Arabic Automatic Speech Recognition (MSc. Thesis)

Oussama Enshassi

Automatic Speech Recognition (ASR) is translation of spoken words into text by computer. ASR technology has been widely integrated into many systems. However, Arabic speech recognition applications still suffer from high error rate, which is mainly due to a variation in speech. Variation in speech leads to a mismatch between the Arabic speech and the trained models.

Variation in speech is a major problem in improving the accuracy of Arabic automatic continuous speech recognition applications. Variability may occur at the phonetic, word, or sentence level. In this thesis, the researcher proposes an approach to adapt acoustic model and language model under limited resource for Arabic speakers. A preliminary work on pronunciation model has also been carried out.

Arabic acoustic modeling has been proposed to overcome the variation in speech under limited resource for Arabic speakers. In our case, if there are several Arabic acoustic models available, we can propose a hybrid approach of interpolation and merging of acoustic model for adapting the target acoustic model. The proposed approaches have proven to be very effective to handle the variability existing in the Arabic speech. The Word Error Rate (WER) was measured for both systems. It was found that the baseline system has the WER equals 13.28% which was significantly decreased to 11.04% in the Enhanced system.

Besides, the researcher proposed interpolation approach for adapting the Arabic language model. The results showed that the baseline system has the WER equals 12.4% which significantly declined to 8.4% in the Enhanced system. In addition, the results showed that applying the hybrid of acoustic approach followed by interpolation language approach achieved considerable improvement of 5.32% in the WER. The baseline system has the WER equals 13.28% which was significantly reduced to 7.96% in the Enhanced system.
However, the proposed phonetic rules in pronunciation model did not lead to a significant improvement.

Keywords: Arabic automatic speech recognition, Acoustic modeling, language modeling, Modern Standard Arabic.