A Cross-Modal Benchmark of Deep Learning Encoders for Automated Liver Segmentation
Abstract
Many clinical processes, including the scheduling of liver transplants, volumetry, and management of chronic diseases, including cirrhosis and fibrosis, rely on accurate segmentation of the liver. These critical clinical situations, including incorrect tracking of disease progression, pose significant risks to patient safety when there is an incorrect segmentation. The occurrence of the domain shift phenomenon is when models that have been trained on Magnetic Resonance Imaging (MRI) fail to generalize to Computed Tomography (CT). This problem is a serious limitation to the clinical utility of such systems in healthcare facilities despite the fact that Deep Learning (DL) has demonstrated great potential in automating the task. Despite the regular introduction of the state-of-the-art methods of Unsupervised Domain Adaptation (UDA), there is not much information regarding the natural cross-modality robustness of simplistic encoder structures. To compare adaptability, a baseline was established using four popular encoder backbones in a standard U-Net structure. These backbones include ResNet18, EfficientNetB3, DenseNet121, and MobileNetV2. The LiverHccSeg data (arterial-phase T1-weighted MRI) was used to train models to evaluate the differences in model-specific performance. They were subsequently tested on three external sets which represented varying levels of domain shift: CHAOS (MRI), 3D-IRCADb-01 and SLIVER07 (CT). The systematic analysis indicates that in-domain accuracy does not define cross-modality stability. The best performance on the in-domain test set (Dice Similarity Coefficient [DSC] = 0.928) results in poor performance on generalization tasks, though DenseNet121 was best. Conversely, EfficientNetB3 was the most robust design, and delivered the best performance on the external MRI protocol (CHAOS, DSC 0.852) and cross-modality CT data (3D-IRCADb-01, DSC 0.825). It implies that compound scaling principles improve the achievement of cross-modality structural representations that are essential in handling various liver diseases. ResNet18, in turn, demonstrated the lowest cross-modality transfer (DSC 0.769-0.788 on CT), which indicates that it is susceptible to training modality specific overfitting intensity patterns. Furthermore, the lightweight MobileNetV2 showed comparable generalization (DSC ∼0.81 on CT), demonstrating that strong performance and computational efficiency are compatible. The use of 2D slice processing, which limits the acquisition of complete 3D volumetric
context, and the comparatively small size of the training cohort (n = 17) are the main
limitations of this study. In order to further reduce the patient risks related to domain
shift, future research should concentrate on validating our results on larger, multi-center
datasets and investigating 3D structures. In the end, these findings suggest that choosing
the appropriate encoder is an important initial step in developing cross-modal segmentation
systems that are reliable for widespread clinical application.
Collections
- Undergraduate Thesis [23]