Large Language Models (LLMs) are emerging as powerful tools for clinical decision support, with potential to match human expertise in internal medicine.
Why it matters:
Understanding the effectiveness of AI in clinical settings is crucial for clinicians navigating its integration into practice.
LLMs can enhance diagnostic accuracy and decision-making speed, directly impacting patient outcomes.
Clinical implications:
In a study conducted at the University Hospital Split, AI models exhibited significant potential in emergency settings.
The "o1" model achieved a mean final rating of 3.63, statistically equivalent to human physicians, reflecting its advanced reasoning capabilities.
Despite errors in therapy planning, models like Claude-3.5-Sonnet and Llama-3.2-70B still maintained ≥90% diagnostic accuracy, indicating reliability as decision-support tools.
Study insights:
The assessment involved 73 anonymized patient cases evaluated by independent specialists.
The "o1" model correctly classified 100% of abnormal lab values, showcasing its diagnostic prowess.
Other models, while slightly less accurate, demonstrated impressive performance with 99.5% and 99% accuracy, respectively.
What's next:
As AI technology progresses, further validation in diverse clinical scenarios is needed to build trust and refine integration strategies.
Ongoing studies will determine how best to incorporate these tools into everyday clinical workflows.