Temporomandibular joint CBCT image segmentation via multi-view ensemble learning network
Accurate segmentation of the temporomandibular joint (TMJ) from cone beam CT (CBCT) images holds significant clinical value for diagnosing temporomandibular joint osteoarthrosis (TMJOA) and related conditions. Convolutional neural network-based medical image segmentation methods have achieved state-...
Saved in:
Published in | Medical & biological engineering & computing Vol. 63; no. 3; pp. 693 - 706 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.03.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Accurate segmentation of the temporomandibular joint (TMJ) from cone beam CT (CBCT) images holds significant clinical value for diagnosing temporomandibular joint osteoarthrosis (TMJOA) and related conditions. Convolutional neural network-based medical image segmentation methods have achieved state-of-the-art performance in various segmentation tasks. However, 3D medical images segmentation requires substantial global context and rich spatial semantic information, demanding much more GPU memory and computational resources. To address these challenges in 3D medical image segmentation, we propose a novel network— the MVEL-Net (Multi-view Ensemble Learning Network) for TMJ CBCT image segmentation. By resampling images along three dimensions, we generate multiple weak learners with different spatial semantic information. A subsequent strong learning network effectively integrates the outputs from these weak learners to achieve more accurate segmentation results. We evaluated our network model using a clinical dataset comprising 88 subjects with TMJ CBCT images. The average Dice similarity coefficient (DSC) was 0.9817 ± 0.0049, the average surface distance was 0.0540 ± 0.0179 mm, and the 95% Hausdorff distance was 0.1743 ± 0.0550 mm. Our proposed MVEL-Net demonstrates excellent segmentation performance on TMJ from CBCT images, while using fewer GPU memory resources compared to other 3D networks. The effectiveness of this method in capturing spatial context could be leveraged for tasks like organ segmentation from volumetric scans. This may facilitate wider adoption of AI-based solutions for automated analysis of 3D medical images.
Graphical Abstract
The proposed framework of the MVEL-Net network architecture. The high-resolution 3D medical image is down-sampled in multiple directions, resulting in three images that preserve layer details. These down-sampled images, along with an isotropic down-sampled image, {
I
t
,
I
s
,
I
c
,
I
d
}, serve as inputs to weak learner networks Net-1 to Net-4, which produce rough segmentations of the image in the first stage. The four inference results, up-sampled to the original 3D medical image resolution, are then combined with the original image and fed into the strong learner network Net-5 for full-scale segmentation of the entire 3D medical image. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 0140-0118 1741-0444 1741-0444 |
DOI: | 10.1007/s11517-024-03225-6 |