Balancing Accuracy and Readability: Comparative Evaluation of AI Chatbots for Patient Education on Rotator Cuff Tears
| dc.authorscopusid | 57210134406 | |
| dc.authorscopusid | 58304737100 | |
| dc.authorscopusid | 59493412500 | |
| dc.authorscopusid | 57241132400 | |
| dc.authorscopusid | 57192870929 | |
| dc.authorwosid | Ciftci, Mehmet/Nxb-9001-2025 | |
| dc.authorwosid | Aloglu Ciftci, Ebru/Lxb-2688-2024 | |
| dc.authorwosid | Ziroglu, Nezih/N-2480-2019 | |
| dc.authorwosid | Koluman, Ali Can/Izq-0097-2023 | |
| dc.contributor.author | Koluman, Ali Can | |
| dc.contributor.author | Ciftci, Mehmet Utku | |
| dc.contributor.author | Ciftci, Ebru Aloglu | |
| dc.contributor.author | Cakmur, Basar Burak | |
| dc.contributor.author | Ziroglu, Nezih | |
| dc.date.accessioned | 2025-12-15T15:28:47Z | |
| dc.date.available | 2025-12-15T15:28:47Z | |
| dc.date.issued | 2025 | |
| dc.department | Okan University | en_US |
| dc.department-temp | [Koluman, Ali Can] Bakirkoy Dr Sadi Konuk Training & Res Hosp, Dept Orthoped & Traumatol, TR-34147 Istanbul, Turkiye; [Ciftci, Mehmet Utku] Sultan Abdulhamid Han Training & Res Hosp, Dept Orthoped & Traumatol, TR-34668 Istanbul, Turkiye; [Ciftci, Ebru Aloglu] Istanbul Okan Univ, Fac Hlth Sci, Div Physiotherapy & Rehabil, TR-34959 Istanbul, Turkiye; [Ciftci, Ebru Aloglu] Istinye Univ, Inst Grad Educ, Dept Physiotherapy & Rehabilitaat, TR-34010 Istanbul, Turkiye; [Cakmur, Basar Burak] Beylikduzu State Hosp, Dept Orthoped & Traumatol, TR-34500 Istanbul, Turkiye; [Ziroglu, Nezih] Acibadem Mehmet Ali Aydinlar Univ, Vocat Sch Hlth Serv, Dept Orthoped Prosthet & Orthot, TR-34638 Istanbul, Turkiye; [Ziroglu, Nezih] Acibadem Univ, Atakent Hosp, Dept Orthopaed & Traumatol, TR-34303 Istanbul, Turkiye | en_US |
| dc.description.abstract | Background/Objectives: Rotator cuff (RC) tears are a leading cause of shoulder pain and disability. Artificial intelligence (AI)-based chatbots are increasingly applied in healthcare for diagnostic support and patient education, but the reliability, quality, and readability of their outputs remain uncertain. International guidelines (AMA, NIH, European health communication frameworks) recommend that patient materials be written at a 6th-8th grade reading level, yet most online and AI-generated content exceeds this threshold. Methods: We compared responses from three AI chatbots-ChatGPT-4o (OpenAI), Gemini 1.5 Flash (Google), and DeepSeek-V3 (Deepseek AI)-to 20 frequently asked patient questions about RC tears. Four orthopedic surgeons independently rated reliability and usefulness (7-point Likert) and overall quality (5-point Global Quality Scale). Readability was assessed using six validated indices. Statistical analysis included Kruskal-Wallis and ANOVA with Bonferroni correction; inter-rater agreement was measured using intraclass correlation coefficients (ICCs). Results: Inter-rater reliability was good to excellent (ICC 0.726-0.900). Gemini 1.5 Flash achieved the highest reliability and quality, ChatGPT-4o performed comparably but slightly lower in diagnostic content, and DeepSeek-V3 consistently scored lowest in reliability and quality but produced the most readable text (FKGL approximate to 6.5, within the 6th-8th grade target). None of the models reached a Flesch Reading Ease (FRE) score above 60, indicating that even the most readable outputs remained more complex than plain-language standards. Conclusions: Gemini 1.5 Flash and ChatGPT-4o generated more accurate and higher-quality responses, whereas DeepSeek-V3 provided more accessible content. No single model fully balanced accuracy and readability. Clinical Implications: Hybrid use of AI platforms-leveraging high-accuracy models alongside more readable outputs, with clinician oversight-may optimize patient education by ensuring both accuracy and accessibility. Future work should assess real-world comprehension and address the legal, ethical, and generalizability challenges of AI-driven patient education. | en_US |
| dc.description.woscitationindex | Science Citation Index Expanded - Social Science Citation Index | |
| dc.identifier.doi | 10.3390/healthcare13212670 | |
| dc.identifier.issn | 2227-9032 | |
| dc.identifier.issue | 21 | en_US |
| dc.identifier.pmid | 41228037 | |
| dc.identifier.scopus | 2-s2.0-105021435857 | |
| dc.identifier.scopusquality | Q2 | |
| dc.identifier.uri | https://doi.org/10.3390/healthcare13212670 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14517/8613 | |
| dc.identifier.volume | 13 | en_US |
| dc.identifier.wos | WOS:001612619600001 | |
| dc.identifier.wosquality | Q2 | |
| dc.language.iso | en | en_US |
| dc.publisher | MDPI | en_US |
| dc.relation.ispartof | Healthcare | en_US |
| dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
| dc.rights | info:eu-repo/semantics/openAccess | en_US |
| dc.subject | Rotator Cuff Injuries | en_US |
| dc.subject | Artificial Intelligence | en_US |
| dc.subject | Chatbots | en_US |
| dc.subject | Large Language Models | en_US |
| dc.subject | Patient Education | en_US |
| dc.subject | Health Literacy | en_US |
| dc.subject | Digital Health | en_US |
| dc.title | Balancing Accuracy and Readability: Comparative Evaluation of AI Chatbots for Patient Education on Rotator Cuff Tears | en_US |
| dc.type | Article | en_US |
| dspace.entity.type | Publication |