dc.description.abstract | There are over 265 million Bangla native and non-native speakers, however, the advancements in Bangla Optical Character Recognition is falling behind when compared with other languages because of a broader set of complex characters, multiple handwriting styles, and a lack of datasets. Convolutional Neural Network models have been highly successful in detecting the handwritten alphabet scripts. However, we found that nowadays, two staged detectors, such as CNN-RNN, Encoder-Decoders, Vision Transformers have been doing much better than pure CNNs in pattern recognition and Bengali Compound Character Recognition. In order to understand why it is so, we chose five commonly used pretrained CNN models from Pytorch: VGG-16, ResNet-50, ResNet-101, Wide ResNet-50-2, and ResNeXt-50-32x4d to classify the characters and compare their performances. Grad-CAM and Grad-CAM++ were used to generate heatmaps to see the key areas that the models focused on while classifying. We found pattern problems in Bangla compound characters along with problematic perceptions in our finetuned CNNs that we have thus listed with detailed analysis. | en_US |