Bangla Speaker Accent Variation Classification Using Deep Neural Network: A Distinct Approach
View/ Open
Date
2023-07Author
Alam, Khorshed
Bhuiyan, Mahbubul Haq
Monir, Md. Fahad
Metadata
Show full item recordAbstract
Accent Variation Classification is the technique of detecting an accent or dialect of a human speech based on speech patterns and features from speech. This is useful in developing speech recognition systems, language learning systems, dialect preservation systems, sociolinguistic studies, voice assistance, improving speech synthesis and voiceover systems. It can be used in conducting forensic analysis on audio data to determine regional origin or specific accent traits. Furthermore, it is a useful tool in criminal investigations and judicial actions. Deep Neural Networks (DNNs) are utilized for speech recognition tasks because they can successfully learn complex variables of speech input such as patterns, intensity, rhythm, and temporal information. In this study, we propose Zero Crossing Rate (ZCR), Mel Frequency Cepstral Coefficients (MFCC), Root Mean Square (RMS), MelSpectrogram based feature extraction and DNN based Bangla Speaker Accent Variation Classification model to classify the speaker’s variation from Bangla Speech data. We train our model with 7443 audios from 9303 audios (Formal, Dhaka, Khulna, Barisal, Rajshahi, Sylhet, Chittagong, Mymensingh and Noakhali) and our model achieves 94% accuracy from unseen or new data. We compare its accuracy and performance with other neural networks where LSTM, Stacked LSTM and DCNN achieve accuracy of 67%, 71% and 85% respectively.
Collections
- 2023 [67]