School of Engineering, Technology & Sciences

School of Engineering, Technology & Sciences https://ar.iub.edu.bd/handle/11348/8 SETS Wed, 29 Apr 2026 14:37:59 GMT 2026-04-29T14:37:59Z A Cross-Modal Benchmark of Deep Learning Encoders for Automated Liver Segmentation https://ar.iub.edu.bd/handle/11348/1077 A Cross-Modal Benchmark of Deep Learning Encoders for Automated Liver Segmentation Rahman, Redwan; Nahar, Kazi Toushia Many clinical processes, including the scheduling of liver transplants, volumetry, and management of chronic diseases, including cirrhosis and fibrosis, rely on accurate segmentation of the liver. These critical clinical situations, including incorrect tracking of disease progression, pose significant risks to patient safety when there is an incorrect segmentation. The occurrence of the domain shift phenomenon is when models that have been trained on Magnetic Resonance Imaging (MRI) fail to generalize to Computed Tomography (CT). This problem is a serious limitation to the clinical utility of such systems in healthcare facilities despite the fact that Deep Learning (DL) has demonstrated great potential in automating the task. Despite the regular introduction of the state-of-the-art methods of Unsupervised Domain Adaptation (UDA), there is not much information regarding the natural cross-modality robustness of simplistic encoder structures. To compare adaptability, a baseline was established using four popular encoder backbones in a standard U-Net structure. These backbones include ResNet18, EfficientNetB3, DenseNet121, and MobileNetV2. The LiverHccSeg data (arterial-phase T1-weighted MRI) was used to train models to evaluate the differences in model-specific performance. They were subsequently tested on three external sets which represented varying levels of domain shift: CHAOS (MRI), 3D-IRCADb-01 and SLIVER07 (CT). The systematic analysis indicates that in-domain accuracy does not define cross-modality stability. The best performance on the in-domain test set (Dice Similarity Coefficient [DSC] = 0.928) results in poor performance on generalization tasks, though DenseNet121 was best. Conversely, EfficientNetB3 was the most robust design, and delivered the best performance on the external MRI protocol (CHAOS, DSC 0.852) and cross-modality CT data (3D-IRCADb-01, DSC 0.825). It implies that compound scaling principles improve the achievement of cross-modality structural representations that are essential in handling various liver diseases. ResNet18, in turn, demonstrated the lowest cross-modality transfer (DSC 0.769-0.788 on CT), which indicates that it is susceptible to training modality specific overfitting intensity patterns. Furthermore, the lightweight MobileNetV2 showed comparable generalization (DSC ∼0.81 on CT), demonstrating that strong performance and computational efficiency are compatible. The use of 2D slice processing, which limits the acquisition of complete 3D volumetric context, and the comparatively small size of the training cohort (n = 17) are the main limitations of this study. In order to further reduce the patient risks related to domain shift, future research should concentrate on validating our results on larger, multi-center datasets and investigating 3D structures. In the end, these findings suggest that choosing the appropriate encoder is an important initial step in developing cross-modal segmentation systems that are reliable for widespread clinical application. Mon, 01 Dec 2025 00:00:00 GMT https://ar.iub.edu.bd/handle/11348/1077 2025-12-01T00:00:00Z Dual-Task Real-Time Low-Light Lane and Pothole Detection for Resource-Constrained Environments https://ar.iub.edu.bd/handle/11348/1076 Dual-Task Real-Time Low-Light Lane and Pothole Detection for Resource-Constrained Environments Alam, Md Iftekharul Lane detection and road hazard awareness are crucial for ensuring safety in autonomous driving and Advanced Driver-Assistance Systems (ADAS). These systems rely heavily on clear visual cues, which are often compromised in low- light driving scenarios. The challenge is especially pronounced in low- and middle-income countries (LMICs), where poorly illuminated roads, faded lane markings, and unmaintained sur- faces frequently co-occur. Under such conditions, conventional single-model detectors trained for daytime environments degrade sharply, as lane cues and pothole textures often compete in the same field of view. To address this, we present a lightweight dual- model pipeline that integrates a low-light enhancement front end with an OpenCV-based lane delineation pipeline and a YOLOv12 detector for pothole localization. The models run in parallel on shared inputs, and their outputs are fused to generate a unified lane geometry and hazard map in a single pass. The architecture is optimized for modest compute and memory budgets, enabling deployment in resource-constrained settings while maintaining high throughput. Evaluated on evening-time urban road scenes from Bangladesh, achieves 88.7potholes and 89.3FPS on NVIDIA GTX 1050Ti, outperforming a single-detector baseline. These results highlight the potential of our approach for practical, real-time ADAS perception in underserved regions. Index Terms—Low-light imaging, Lane detection, Pothole de- tection, YOLOv12, OpenCV, Image enhancement, Edge computing Autonomous driving Wed, 01 Jan 2025 00:00:00 GMT https://ar.iub.edu.bd/handle/11348/1076 2025-01-01T00:00:00Z Enhancing Violence Detection: Comparative Analysis of Frame Difference and Conventional Methods Using the UCF Crime Dataset https://ar.iub.edu.bd/handle/11348/1075 Enhancing Violence Detection: Comparative Analysis of Frame Difference and Conventional Methods Using the UCF Crime Dataset Islam, Md. Zahidul; Sani, Saniul Islam Violence is one of the biggest concerns in the world. It leads to big problems in both society and the economy. This is an important issue, according to both the World Health Organization and the Institute for Economics and Peace. Video surveillance has become a big part of keeping people safe. However, someone has to view the video in traditional systems for them to be effective. Viewers are limited and tend to miss violent events in time. It becomes increasingly difficult to monitor violent events when there are too many cameras. This proves the importance of automated real-time detection for violent events. The aim of this thesis is to create a deep learning method that makes detecting these events more accurate, fast, and efficient. It suggests a hybrid approach that combines two powerful tools: Convolutional Neural Network (CNNs) and Recurrent Neural Network (RNNs). It uses a pre-trained ResNet50 model for extracting spatial features. It then follows up with a Long Short-Term Memory (LSTM) network for extracting temporal features. This approach works very well for both space and time features. It doesn't need as much processing power as more complicated models like 3D CNNs. It explores two ways in which the video data is preprocessed for the model. The first is through evenly spaced RGB frames. The other is where it explores the differences between frames. This involves calculating the absolute difference between frames. The disparities in frames highlight motion. This is important for the identification of violence. They also remove noise in stationary backgrounds. This makes it simpler to locate things more surely and using less computer power. This makes it practical for real-time applications. The UCF Crime dataset is used for testing. It contains more than 1,900 real-life videos. These videos contain issues such as light intensity levels, diverse angles, and an imbalance between violent and non-violent videos. Both methods were tested under equal conditions. Accuracy and the Area Under the Receiver Operating Characteristic Curve (AUROC) were used for testing the models. Both the frame difference approach and the frame difference method were correct 85% of the time. The frame difference approach also outperformed the other by having a higher AUROC. This is 0.9091 compared to 0.8892 by the other. This is more effective at separating the two. It also improved training time by 2.9% and inference time by 4.2%. This indicates it is more accurate and efficient for CNN-LSTM models in detecting violence. The model is very effective in practical scenarios. It can apply in a range of scenarios concerning surveillance. Through the focus on motion, it is able to separate the violent scenes from the non-violent scenes even in crowded settings. Due to its effectiveness, it is also appropriate for resource-poor devices. The model is able to apply in the current systems used for surveillance. This is even in public centers or public transport. This development is an improvement in the systems for intelligent surveillance. This is a practical approach towards public safety in a complex city setting. This is a practical approach towards public safety. This is a practical approach towards public safety. This is a practical approach towards public safety. This can be implemented in schools, airports, and other busy places where it is necessary to detect violence immediately. This method ensures that people are protected because it solves present problems and allows people to develop. Fri, 01 Aug 2025 00:00:00 GMT https://ar.iub.edu.bd/handle/11348/1075 2025-08-01T00:00:00Z Ensemble Deep Learning for Retinal Disease Detection from Fundus Images: A Bilateral Ensemble Approach for Retinal Disease Classification https://ar.iub.edu.bd/handle/11348/1068 Ensemble Deep Learning for Retinal Disease Detection from Fundus Images: A Bilateral Ensemble Approach for Retinal Disease Classification Shanto, Abdullah Al Alam; Esha, Mumtahina; Alam, Sadia Retinal diseases including Age-related Macular Degeneration (AMD), Diabetic Retinopathy (DR), Cataract, and Myopia, are among the most prevalent causes of irreversible vision loss worldwide. Early and accurate diagnosis is vital for effective treatment and management, yet manual assessment of retinal fundus images is time-consuming, subject to inter-observer variability, and often challenging due to subtle pathological features. Motivated by these clinical demands, we experimented with the classification of retinal diseases using deep learning models, starting from conventional deep learning methods and advancing towards a bilateral ensemble solution. The first phase of our experimentations involved the use of convolutional neural networks and vision transformers for multi-class classification using individual ocular fundus images. Conventional approaches showed promise however they are limited in their capacity to extract complex features specific to each disease and limited to analysis of single eye which can make the diagnostic accuracy of these approaches to be unreliable as single eye diagnosis might not capture the potential correlations of both eyes of the same individual. Such limitations were suggestive in the experimental scenarios where both eyes presented disease cues, and the method was restricted in capturing complementary information. It is in view of such limitations that the next stage of experimentations has made use of various ensemble learning strategies. Satisfying the need for more potentials, ensemble methods tried to incorporate a better set of features by combining the predictions of several state-of-the-art deep learning network architectures. The comparative analysis examined different decision ensemble techniques as well as feature ensemble techniques, finding out that though the ensemble methods have improved the overall classification task over conventional models, there were still certain problems. Most of all, ensemble strategies were still considering images from each eye as separate entities, ignoring valuable information that are present in both eyes of a subject affected by the same eye disease. Our major contribution, based on these insights, is the design and validation of a custom bilateral ensemble framework for the classification of retinal diseases. This strategy is found to be unique in combining corresponding fundus images from both eyes of a subject from a ConvNeXt-XLarge backbone with preprocessing and bilateral feature fusion technique. Using the disease features in both eyes together, the proposed framework showed a promising performance for the multiclass disease detection. Comprehensive experiments on the publicly available OIA-ODIR dataset demonstrated that the bilateral ensemble approach overcomes conventional method limitations, delivering improved and more reliable results. The research establishes a foundation for advanced deep learning in detection of eye diseases. Our work presents that systematic improvements to data presentation and system architecture can lead to better diagnostic performance and improved potential of artificial intelligence based systems in the medical field. These findings support future computational ophthalmology research and practical diagnostic tool development for global eye health. Mon, 01 Dec 2025 00:00:00 GMT https://ar.iub.edu.bd/handle/11348/1068 2025-12-01T00:00:00Z