🔎 ChatGPT trailed peers in cardiac preop assessment
🔎 ChatGPT trailed peers in cardiac preop assessment
In an exploratory study of 41 consecutive patients with severe heart disease undergoing noncardiac surgery, ChatGPT posted the lowest accuracy across key preop anesthesia tasks—58.5% for ASA classification, 46.3% for NYHA and RCRI, and 51.2% for pulmonary risk—behind DeepSeek and Grok. The study found all three LLMs showed only slight-to-substantial agreement with a five-anesthesiologist expert panel and were not ready for direct clinical use in complex cardiac preoperative assessment.
Why It Matters To Your Practice
Cardiac preop assessment is a high-stakes workflow where errors in ASA class, functional status, or risk scoring can alter anesthesia planning and perioperative management.
This comparison suggests LLM performance can vary meaningfully by model, so “AI assistance” should not be treated as interchangeable at the bedside.
Across models, consistent overuse of invasive monitoring recommendations and omission of BIS monitoring highlight the risk of systematic bias, not just random mistakes.
Clinical Implications
If clinicians use LLMs for perioperative support, the safer role today is as a supervised checklist or second-opinion generator—not an autonomous decision-maker.
Outputs on ASA, NYHA, RCRI, pulmonary risk, and anesthetic planning still need direct verification against guideline-based assessment and specialist judgment.
Be especially cautious in patients with severe heart disease, where nuanced context from records, symptoms, and procedure-specific risk may exceed current model reliability.
Insights
The study evaluated standardized vignettes based on real records from 41 consecutive cardiac patients and benchmarked outputs to a structured expert consensus from five senior anesthesiologists.
DeepSeek led ASA accuracy at 73.2%, while Grok led NYHA, RCRI, and pulmonary risk at 75.6%, 75.6%, and 80.5%, respectively; ChatGPT was lowest on each of those tasks.
All models favored general anesthesia in at least 85% of cases, overemphasized IBP and CVP monitoring, and none mentioned BIS monitoring.
The Bottom Line
For complex noncardiac surgery in patients with heart disease, current LLMs can structure thinking but should not replace clinician-led preop anesthesia assessment.
The near-term opportunity is workflow support under expert supervision, pending stronger multicenter validation and better calibration to perioperative standards.