Advancing Bengali Dialect Identification (DiD) Through The BengDiDa Dataset & Dialect Classification
Abstract
Dialect Identification (DID) is necessary for recognizing linguistic diversity alongside improving speech technologies, especially for Bengali dialects which are considerably diverse in nature due to influence inflicted by geographical, cultural, and social factors. Phonetic similarities to adjacent dialects, the scarcity of comprehensive datasets, and the dynamic nature of speech sounds raise the difficulty of identifying these dialects. In this paper, we introduce the BengDiDa dataset, an extensive Bengali dialect speech corpus. Beng-DiDa includes 48,000 audio samples from 20 distinct dialects, designed to support the development of advanced modeling techniques. This research also looks at the efficacy of Convolutional Recurrent Neural Networks for the DiD task. The results highlight the importance of effective feature extraction and the management of spatial and channel-wise dependencies, thereby advancing automatic speech recognition and contributing to the preservation of linguistic heritage. Among evaluated architectures, a fine-tuned ResNet-50 backbone raises accuracy from 80.8 % (F1 = 0.83) to 85.3 % (F1 = 0.85), yet a purpose-built four-layer CNN + BiGRU achieves the best score of 90.3 % (F1 = 0.90) on raw MFCC/GFCC/RPLP features, underscoring the advantage of lightweight, task-
specific designs.
Collections
- Undergraduate Thesis [19]