BClear - Majority-vote contours yielded best segmentation model

🧬 Majority-vote contours yielded best segmentation model

In a study of 801 endoscopic images from 24 patients undergoing HDR brachytherapy for rectal tumors, a DeepLabV3 model trained on majority-vote contours delivered the best segmentation performance (average Dice 0.77), outperforming models trained on individual experts' annotations. The same study also found substantial annotation variability for ulcers and radiation proctitis (Dice 0.36 and 0.57) versus tumors (0.83), underscoring how ground-truth disagreement can shape AI performance.

Why It Matters To Your Practice

Observer variability remains a major issue in endoscopic delineation during rectal brachytherapy workflows.
AI segmentation performance depends heavily on the quality and consistency of clinician annotations used for training.
Consensus-derived labels may produce stronger models than labels from any single expert.

Clinical Implications

The majority-vote model performed best overall, but it also produced frequent false positives by misclassifying ulcers and radiation proctitis as tumors.
That means automated contours may help with efficiency, but still require clinician review before treatment decisions or adaptive planning.
Annotators tended to score models trained on their own contours more favorably on unseen images, suggesting user-specific preferences may affect adoption.

Insights

Three expert annotators labeled tumors, scarring, ulcers, and radiation proctitis across 801 images from 24 patients.
Inter-observer agreement was highest for tumors (average Dice 0.83) and much lower for ulcers (0.36) and radiation proctitis (0.57).
Intra-observer Dice scores after 6 months were 0.72, 0.68, and 0.87, showing that even the same clinician may contour differently over time.

The Bottom Line

For clinicians evaluating AI-assisted endoscopic segmentation, consensus annotations appear to be the most reliable training target.
But strong average Dice scores do not eliminate clinically important false positives, so human oversight remains essential.
Automated contouring could support adaptive AI-assisted brachytherapy workflows, provided models are validated against real-world edge cases.

Page updated

Report abuse