🔬 AI risk scores from prior mammograms hint at tumor grade
🔬 AI risk scores from prior mammograms hint at tumor grade
An FDA-approved AI model applied to prior-year screening mammograms in 1,509 women across four U.S. states showed modest ability to predict biopsy-confirmed cancer (AUC 0.62), and among 508 cancers, higher prior-year scores were linked to lower-grade tumors. In this retrospective study, Grade 3 tumors had lower prior-year AI risk scores than Grade 1 tumors, suggesting the model may detect subtle imaging features of low-grade malignancy before clinical detection.
Why It Matters To Your Practice
AI risk scores from earlier mammograms may reflect biologic behavior, not just near-term cancer presence.
If validated, these tools could help clinicians better understand which cancers are more likely to leave subtle imaging signals a year before diagnosis.
The study also adds explainability: higher AI scores may be more associated with low-grade disease than with more aggressive tumors.
Clinical Implications
Do not treat these scores as stand-alone cancer detectors: discrimination was modest, with an AUC of 0.62.
Lower scores may not reassure against higher-grade disease, since Grade 3 tumors trended toward lower prior-year risk scores.
For breast imagers and referring clinicians, AI outputs may eventually be more useful for contextual risk stratification than for binary rule-in/rule-out decisions.
Insights
The cohort included women biopsied after a 2021 screening mammogram; 33.7% had biopsy-confirmed malignant neoplasm.
In univariate analysis, invasive lobular carcinoma showed higher prior-year scores than ductal carcinoma in situ, but that association was no longer significant after adjustment for grade.
This suggests tumor grade may mediate some of what the AI model is detecting on prior mammograms.
The Bottom Line
Prior-year mammography AI risk scores may pick up faint imaging patterns linked to low-grade breast cancer about 1 year before diagnosis.
That is promising for understanding model behavior, but the current performance is not strong enough to support overreliance in practice.
For now, think of this as an explainability and hypothesis-generating signal, not a practice-changing screening endpoint.