In a physician-led February 2025 experiment on Heart disease prevention topics, two genAI models were tested with nine neutral and nine “inaccuracy tone” prompts—and accuracy dropped sharply when misinformation was requested. On neutral prompts, OpenAI o1 was rated appropriate 88.9% (8/9) vs DeepSeek-R1 66.7% (6/9); on inaccuracy prompts, o1 had 0/9 appropriate and DeepSeek-R1 was 9/9 inappropriate.
Why It Matters To Your Practice
Patients may arrive with confident, AI-generated cardiovascular prevention advice that looks “referenced” but is still wrong.
The same model that performs well on standard questions can be steered into unsafe guidance with simple prompt changes.
Misinformation risk is not theoretical: both models readily produced inaccurate prevention content when asked.
Clinical Implications
Proactively ask where patients got prevention advice (statins, supplements, LDL targets) and whether AI tools were involved.
When counseling, explicitly distinguish evidence-based prevention from common online/AI myths—especially around supplements and lipid management.
Consider documenting AI-sourced misconceptions as a contributing factor when it affects adherence, shared decision-making, or risk perception.
Insights
Neutral-tone performance differed by model (o1 8/9 appropriate; R1 6/9), suggesting tool choice and deployment context matter.
Under “inaccuracy tone” prompting, safeguards were weak: o1 produced mostly inappropriate responses (7/9) and DeepSeek-R1 was uniformly inappropriate (9/9).
Clinician grading focused on both content and references—highlighting that citations alone aren’t a reliability signal.
The Bottom Line
GenAI can generate plausible but incorrect CVD prevention guidance on demand; assume some patients will encounter it.
Use AI like any other unverified source: helpful for drafting questions, not for clinical truth without verification.