Undergraduate Thesis

Undergraduate Thesis https://ar.iub.edu.bd/handle/123456789/624 By CSE Department 2026-05-06T11:57:13Z A Baseline Analysis of Cross-Modal Liver Tumor Segmentation and the Role of Frozen Encoder https://ar.iub.edu.bd/handle/11348/1168 A Baseline Analysis of Cross-Modal Liver Tumor Segmentation and the Role of Frozen Encoder Iqbal, Md. Zafor Accurate liver tumor segmentation is critical for surgical planning, volumetry, and treatment monitoring across diverse liver pathologies and clinical applications. Both Computed tomography (CT) and Magnetic resonance imaging (MRI) serve as essential modalities in clinical practice, yet segmentation models trained on one modality typically fail when applied to the other. This limitation reflects a fundamental gap in the understanding of why complex architectural innovations are necessary for cross-modal robustness. Most literature proposes sophisticated solutions without first establishing what simpler methods achieve or where they encounter irreducible obstacles. This thesis provides a systematic baseline analysis of cross-modal liver and tumor seg- mentation, deliberately adopting simple approaches to characterize their capabilities and limitations. The study employs a frozen ResNet18 encoder combined with a U-Net decoder within a two-stage pipeline that first segments the liver, then detects lesions within the hepatic region. This straightforward architecture is evaluated across five public datasets spanning both modalities: LiverHCCSeg and CHAOS for MRI, and LiTS, 3D-IRCADb-01, and SLiver07 for CT. 2026-04-01T00:00:00Z Object Detection and Localization Using Lightweight Convolutional Neural Networks for Low-Resolution Thermal Sensor Data https://ar.iub.edu.bd/handle/11348/1167 Object Detection and Localization Using Lightweight Convolutional Neural Networks for Low-Resolution Thermal Sensor Data Barua, Protik Embedded monitoring systems that rely on RGB cameras raise well documented privacy concerns in sensitive spaces. Low-resolution thermal sensors offer an alternative: The Panasonic AMG8833 Grid-EYE outputs only an 8×8 grid of 64 temperature values too coarse to identify individuals, yet structured enough to distinguish an empty room, a person, and a fire event. This thesis proposes JFilterLocalizationCNN, a lightweight dual-head convolutional neural net- work for simultaneous classification and heat-source localization on native 8×8 thermal data. Three classes are used - No Object, Object, and Object with Fire - with normalized (x, y) coordi- nates for the dominant heat source. A J-filter removes ambient offset via per-frame background subtraction; normalization maps readings to [0, 1] for stable training without distorting spatial signatures needed for localization. Training uses AdamW with combined cross-entropy and MSE losses on 3,087 labeled frames. On a stratified test set of 618 samples, the model reaches 99.68% classification accuracy (F1 above 0.99 per class) and 0.37-pixel MAE localization (RMSE 0.67 pixels). The two tasks share one forward pass and one set of convolutional weights, which keeps the design small enough to reason about on paper and, in principle, to port to an embedded runtime later. The experiments here were conducted in a single indoor setting with a controlled heat source for the fire-related class. That scope limits how far one should generalize the numbers, yet they still suggest that joint learning on native 8×8 frames is workable when preprocessing preserves relative heat structure. On-device latency, power use, and post-training quantization are left for future work.Chapters that follow spell out the dataset, preprocessing, layer stack, training schedule, and metrics in full. The aim is a thesis someone can audit line by line rather than a high-level claim alone. 2026-04-01T00:00:00Z JewelNet: A Custom CNN and Transfer Learning–Based Approach for Fine-Grained Jewelry Classification https://ar.iub.edu.bd/handle/11348/1166 JewelNet: A Custom CNN and Transfer Learning–Based Approach for Fine-Grained Jewelry Classification Nishi, Jannatul Ferdous; Billa, Md. Masum Fine-tuned jewelry image classification poses a considerable challenge to the computer vision application due to intraclass similarity, reflective metal material, complex background, and the absence of the corresponding training data. Traditional classification approaches using manually extracted features such as histograms of colors, texture description, and structural attributes have proven ineffective in solving the particular issue due to their limited ability to discriminate between structurally similar classes of items such as bangles and bracelets. Thus, the current research proposes a novel model for solving the given problem named JewelNet, which is an all-inclusive deep learning classifier that utilizes three different architectural approaches to classification including a custom CNN architecture, transfer learning using VGG16 and ResNet50, and the state-of-the-art EfficientNetB2 architecture. Two different sets of experiments were designed based on the same framework to compare various model architectures. A large dataset of 1,217 images of eight different types of jewelry items (bangle, bracelet, chain, earring, necklace, pendant, ring, and nose pin) was assembled. To diversify the collected data, various image augmentation techniques such as rotations, flips, zooming, changes to the brightness, and channels were employed. As a result, the size of the dataset increased to 8,519 images. Experimental evaluation revealed that the EfficientNetB2 architecture achieves the best classification performance with accuracy of 95.21%, with precision and recall being equal to 95.33% and 95.06%, respectively, leading to an F1-score of 94.97%. At the same time, the best classification accuracy for VGG16 is 94.19%, and Custom CNN yields 93.78%. The per- class classification demonstrates that recall for EfficientNetB2 equals 100% for earrings, necklaces, pendants, and rings. Besides, the deep features obtained from this network can be used for recommendation purposes with cosine similarity greater than 0.92. 2026-04-01T00:00:00Z A Cross-Modal Benchmark of Deep Learning Encoders for Automated Liver Segmentation https://ar.iub.edu.bd/handle/11348/1077 A Cross-Modal Benchmark of Deep Learning Encoders for Automated Liver Segmentation Rahman, Redwan; Nahar, Kazi Toushia Many clinical processes, including the scheduling of liver transplants, volumetry, and management of chronic diseases, including cirrhosis and fibrosis, rely on accurate segmentation of the liver. These critical clinical situations, including incorrect tracking of disease progression, pose significant risks to patient safety when there is an incorrect segmentation. The occurrence of the domain shift phenomenon is when models that have been trained on Magnetic Resonance Imaging (MRI) fail to generalize to Computed Tomography (CT). This problem is a serious limitation to the clinical utility of such systems in healthcare facilities despite the fact that Deep Learning (DL) has demonstrated great potential in automating the task. Despite the regular introduction of the state-of-the-art methods of Unsupervised Domain Adaptation (UDA), there is not much information regarding the natural cross-modality robustness of simplistic encoder structures. To compare adaptability, a baseline was established using four popular encoder backbones in a standard U-Net structure. These backbones include ResNet18, EfficientNetB3, DenseNet121, and MobileNetV2. The LiverHccSeg data (arterial-phase T1-weighted MRI) was used to train models to evaluate the differences in model-specific performance. They were subsequently tested on three external sets which represented varying levels of domain shift: CHAOS (MRI), 3D-IRCADb-01 and SLIVER07 (CT). The systematic analysis indicates that in-domain accuracy does not define cross-modality stability. The best performance on the in-domain test set (Dice Similarity Coefficient [DSC] = 0.928) results in poor performance on generalization tasks, though DenseNet121 was best. Conversely, EfficientNetB3 was the most robust design, and delivered the best performance on the external MRI protocol (CHAOS, DSC 0.852) and cross-modality CT data (3D-IRCADb-01, DSC 0.825). It implies that compound scaling principles improve the achievement of cross-modality structural representations that are essential in handling various liver diseases. ResNet18, in turn, demonstrated the lowest cross-modality transfer (DSC 0.769-0.788 on CT), which indicates that it is susceptible to training modality specific overfitting intensity patterns. Furthermore, the lightweight MobileNetV2 showed comparable generalization (DSC ∼0.81 on CT), demonstrating that strong performance and computational efficiency are compatible. The use of 2D slice processing, which limits the acquisition of complete 3D volumetric context, and the comparatively small size of the training cohort (n = 17) are the main limitations of this study. In order to further reduce the patient risks related to domain shift, future research should concentrate on validating our results on larger, multi-center datasets and investigating 3D structures. In the end, these findings suggest that choosing the appropriate encoder is an important initial step in developing cross-modal segmentation systems that are reliable for widespread clinical application. 2025-12-01T00:00:00Z