Temporomandibular joint CBCT image segmentation via multi-view ensemble learning network

Accurate segmentation of the temporomandibular joint (TMJ) from cone beam CT (CBCT) images holds significant clinical value for diagnosing temporomandibular joint osteoarthrosis (TMJOA) and related conditions. Convolutional neural network-based medical image segmentation methods have achieved state-...

Full description

Saved in:
Bibliographic Details
Published inMedical & biological engineering & computing Vol. 63; no. 3; pp. 693 - 706
Main Authors Hu, Piaolin, Li, Jupeng, Ma, Ruohan, Zhang, Kai, Guo, Yong, Li, Gang
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.03.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Accurate segmentation of the temporomandibular joint (TMJ) from cone beam CT (CBCT) images holds significant clinical value for diagnosing temporomandibular joint osteoarthrosis (TMJOA) and related conditions. Convolutional neural network-based medical image segmentation methods have achieved state-of-the-art performance in various segmentation tasks. However, 3D medical images segmentation requires substantial global context and rich spatial semantic information, demanding much more GPU memory and computational resources. To address these challenges in 3D medical image segmentation, we propose a novel network— the MVEL-Net (Multi-view Ensemble Learning Network) for TMJ CBCT image segmentation. By resampling images along three dimensions, we generate multiple weak learners with different spatial semantic information. A subsequent strong learning network effectively integrates the outputs from these weak learners to achieve more accurate segmentation results. We evaluated our network model using a clinical dataset comprising 88 subjects with TMJ CBCT images. The average Dice similarity coefficient (DSC) was 0.9817 ± 0.0049, the average surface distance was 0.0540 ± 0.0179 mm, and the 95% Hausdorff distance was 0.1743 ± 0.0550 mm. Our proposed MVEL-Net demonstrates excellent segmentation performance on TMJ from CBCT images, while using fewer GPU memory resources compared to other 3D networks. The effectiveness of this method in capturing spatial context could be leveraged for tasks like organ segmentation from volumetric scans. This may facilitate wider adoption of AI-based solutions for automated analysis of 3D medical images. Graphical Abstract The proposed framework of the MVEL-Net network architecture. The high-resolution 3D medical image is down-sampled in multiple directions, resulting in three images that preserve layer details. These down-sampled images, along with an isotropic down-sampled image, { I t , I s , I c , I d }, serve as inputs to weak learner networks Net-1 to Net-4, which produce rough segmentations of the image in the first stage. The four inference results, up-sampled to the original 3D medical image resolution, are then combined with the original image and fed into the strong learner network Net-5 for full-scale segmentation of the entire 3D medical image.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0140-0118
1741-0444
1741-0444
DOI:10.1007/s11517-024-03225-6