Semi-Supervised 3D Medical Segmentation from
2D Natural Images Pretrained Model

MLMI 2025 (Oral)
Pak Hei Yeung1,2
Jayroop Ramesh2
Pengfei Lyu3
Ana I.L. Namburete2
Jagath Rajapakse1
1 College of Computing and Data Science, Nanyang Technological University, Singapore
2 Oxford Machine Learning in NeuroImaging Lab, University of Oxford, United Kindom
3 Faculty of Robot Science and Engineering, Northeastern University, China
[Paper] [Code] [Presentation (YouTube)] [Bibtex]
M&N teaser

Abstract

This paper explores the transfer of knowledge from general vision models pretrained on 2D natural images to improve 3D medical image segmentation. We focus on the semi-supervised setting, where only a few labeled 3D medical images are available, along with a large set of unlabeled images. To tackle this, we propose a model-agnostic framework that progressively distills knowledge from a 2D pretrained model to a 3D segmentation model trained from scratch. Our approach, M&N, involves iterative co-training of the two models using pseudo-masks generated by each other, along with our proposed learning rate guided sampling that adaptively adjusts the proportion of labeled and unlabeled data in each training batch to align with the models' prediction accuracy and stability, minimizing the adverse effect caused by inaccurate pseudo-masks. Extensive experiments on multiple publicly available datasets demonstrate that M&N achieves state-of-the-art performance, outperforming thirteen existing semi-supervised segmentation approaches under all different settings. Importantly, ablation studies show that M&N remains model-agnostic, allowing seamless integration with different architectures. This ensures its adaptability as more advanced models emerge.


Pipeline

Pipeline
Pipeline of our proposed M&N framework. The 2D and 3D models are iteratively co-trained using pseudo-masks generated by each other, with the unlabeled loss (\(\mathcal{L}_{unlabled}\)), and using labeled images and masks with the labeled loss (\(\mathcal{L}_{labled}\)). This iterative process alternates between odd and even epochs. LRG-sampling dynamically adjusts the proportion of labeled and unlabeled data in each batch based on the current learning rate, optimizing the utilization of available training data.

Quantitative Results

Quantitative Results
Our proposed M&N outperforms 13 existing semi-supervised segmentation approaches across different datasets and low-labeled data settings. ↑ indicates higher values being more accurate, vice versa.

Qualitative Results

Qualitative Results
Qualitative results of 3D segmentation of Pancreas-CT dataset trained from 6 labeled images. *SegFormer is trained from scratch and **SegFormer is pretrained on 2D natural images (ADE20K).

Ablation Studies

Qualitative Results
Ablation studies on the MRI Left Atrial Cavity Dataset with 8 labeled images. Using different 2D and 3D models for M&N surpasses the state-of-the-art AD-MT [29], which verifies model-agnostic nature of M&N. The proposed iterative co-training and LRG-sampling of M&N contribute to significant improvement in performance.

Bibtex

@inproceedings{yeung2025m&n,
	title={Semi-Supervised 3D Medical Segmentation from 2D Natural Images Pretrained Model},
	author={Yeung, Pak-Hei and Ramesh, Jayroop and Lyu, Pengfei and Namburete, Ana and Rajapakse, Jagath C},
	booktitle={Machine Learning in Medical Imaging (MLMI)},
	year={2025}
}			

Acknowledgements

Pak Hei Yeung is funded by the Presidential Postdoctoral Fellowship from Nanyang Technological University. We thank Dr Madeleine Wyburd and Mr Valentin Bacher for their valuable suggestions and comments about the work. This template of this project webpage was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.