Balancing Accuracy and Readability: Comparative Evaluation of AI Chatbots for Patient Education on Rotator Cuff Tears

dc.authorscopusid 57210134406
dc.authorscopusid 58304737100
dc.authorscopusid 59493412500
dc.authorscopusid 57241132400
dc.authorscopusid 57192870929
dc.authorwosid Ciftci, Mehmet/Nxb-9001-2025
dc.authorwosid Aloglu Ciftci, Ebru/Lxb-2688-2024
dc.authorwosid Ziroglu, Nezih/N-2480-2019
dc.authorwosid Koluman, Ali Can/Izq-0097-2023
dc.contributor.author Koluman, Ali Can
dc.contributor.author Ciftci, Mehmet Utku
dc.contributor.author Ciftci, Ebru Aloglu
dc.contributor.author Cakmur, Basar Burak
dc.contributor.author Ziroglu, Nezih
dc.date.accessioned 2025-12-15T15:28:47Z
dc.date.available 2025-12-15T15:28:47Z
dc.date.issued 2025
dc.department Okan University en_US
dc.department-temp [Koluman, Ali Can] Bakirkoy Dr Sadi Konuk Training & Res Hosp, Dept Orthoped & Traumatol, TR-34147 Istanbul, Turkiye; [Ciftci, Mehmet Utku] Sultan Abdulhamid Han Training & Res Hosp, Dept Orthoped & Traumatol, TR-34668 Istanbul, Turkiye; [Ciftci, Ebru Aloglu] Istanbul Okan Univ, Fac Hlth Sci, Div Physiotherapy & Rehabil, TR-34959 Istanbul, Turkiye; [Ciftci, Ebru Aloglu] Istinye Univ, Inst Grad Educ, Dept Physiotherapy & Rehabilitaat, TR-34010 Istanbul, Turkiye; [Cakmur, Basar Burak] Beylikduzu State Hosp, Dept Orthoped & Traumatol, TR-34500 Istanbul, Turkiye; [Ziroglu, Nezih] Acibadem Mehmet Ali Aydinlar Univ, Vocat Sch Hlth Serv, Dept Orthoped Prosthet & Orthot, TR-34638 Istanbul, Turkiye; [Ziroglu, Nezih] Acibadem Univ, Atakent Hosp, Dept Orthopaed & Traumatol, TR-34303 Istanbul, Turkiye en_US
dc.description.abstract Background/Objectives: Rotator cuff (RC) tears are a leading cause of shoulder pain and disability. Artificial intelligence (AI)-based chatbots are increasingly applied in healthcare for diagnostic support and patient education, but the reliability, quality, and readability of their outputs remain uncertain. International guidelines (AMA, NIH, European health communication frameworks) recommend that patient materials be written at a 6th-8th grade reading level, yet most online and AI-generated content exceeds this threshold. Methods: We compared responses from three AI chatbots-ChatGPT-4o (OpenAI), Gemini 1.5 Flash (Google), and DeepSeek-V3 (Deepseek AI)-to 20 frequently asked patient questions about RC tears. Four orthopedic surgeons independently rated reliability and usefulness (7-point Likert) and overall quality (5-point Global Quality Scale). Readability was assessed using six validated indices. Statistical analysis included Kruskal-Wallis and ANOVA with Bonferroni correction; inter-rater agreement was measured using intraclass correlation coefficients (ICCs). Results: Inter-rater reliability was good to excellent (ICC 0.726-0.900). Gemini 1.5 Flash achieved the highest reliability and quality, ChatGPT-4o performed comparably but slightly lower in diagnostic content, and DeepSeek-V3 consistently scored lowest in reliability and quality but produced the most readable text (FKGL approximate to 6.5, within the 6th-8th grade target). None of the models reached a Flesch Reading Ease (FRE) score above 60, indicating that even the most readable outputs remained more complex than plain-language standards. Conclusions: Gemini 1.5 Flash and ChatGPT-4o generated more accurate and higher-quality responses, whereas DeepSeek-V3 provided more accessible content. No single model fully balanced accuracy and readability. Clinical Implications: Hybrid use of AI platforms-leveraging high-accuracy models alongside more readable outputs, with clinician oversight-may optimize patient education by ensuring both accuracy and accessibility. Future work should assess real-world comprehension and address the legal, ethical, and generalizability challenges of AI-driven patient education. en_US
dc.description.woscitationindex Science Citation Index Expanded - Social Science Citation Index
dc.identifier.doi 10.3390/healthcare13212670
dc.identifier.issn 2227-9032
dc.identifier.issue 21 en_US
dc.identifier.pmid 41228037
dc.identifier.scopus 2-s2.0-105021435857
dc.identifier.scopusquality Q2
dc.identifier.uri https://doi.org/10.3390/healthcare13212670
dc.identifier.uri https://hdl.handle.net/20.500.14517/8613
dc.identifier.volume 13 en_US
dc.identifier.wos WOS:001612619600001
dc.identifier.wosquality Q2
dc.language.iso en en_US
dc.publisher MDPI en_US
dc.relation.ispartof Healthcare en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Rotator Cuff Injuries en_US
dc.subject Artificial Intelligence en_US
dc.subject Chatbots en_US
dc.subject Large Language Models en_US
dc.subject Patient Education en_US
dc.subject Health Literacy en_US
dc.subject Digital Health en_US
dc.title Balancing Accuracy and Readability: Comparative Evaluation of AI Chatbots for Patient Education on Rotator Cuff Tears en_US
dc.type Article en_US
dspace.entity.type Publication

Files