Gemini 1.5 Flash Provides the Most Reliable Content While ChatGPT-4o Offers the Highest Readability for Patient Education on Meniscal Tears

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Wiley

Abstract

Purpose The aim of this study was to comparatively evaluate the responses generated by three advanced artificial intelligence (AI) models, ChatGPT-4o (OpenAI), Gemini 1.5 Flash (Google) and DeepSeek-V3, to frequently asked patient questions about meniscal tears in terms of reliability, usefulness, quality, and readability. Methods Responses from three AI chatbots, ChatGPT-4o (OpenAI), Gemini 1.5 Flash (Google) and DeepSeek-V3 (DeepSeek AI), were evaluated for 20 common patient questions regarding meniscal tears. Three orthopaedic specialists independently scored reliability and usefulness on 7-point Likert scales and overall response quality using the 5-point Global Quality Scale. Readability was analysed with six established indices. Inter-rater agreement was examined with intraclass correlation coefficients (ICCs) and Fleiss' Kappa, while between-model differences were tested using Kruskal-Wallis and ANOVA with Bonferroni adjustment. Results Gemini 1.5 Flash achieved the highest reliability, significantly outperforming both GPT-4o and DeepSeek-V3 (p = 0.001). While usefulness scores were broadly similar, Gemini was superior to DeepSeek-V3 (p = 0.045). Global Quality Scale scores did not differ significantly among models. In contrast, GPT-4o consistently provided the most readable content (p < 0.001). Inter-rater reliability was excellent across all evaluation domains (ICC > 0.9). Conclusion All three AI models generated high-quality educational content regarding meniscal tears. Gemini 1.5 Flash demonstrated the highest reliability and usefulness, while GPT-4o provided significantly more readable responses. These findings highlight the trade-off between reliability and readability in AI-generated patient education materials and emphasise the importance of physician oversight to ensure safe, evidence-based integration of these tools into clinical practice.

Description

Keywords

ChatGPT, DeepSeek, Gemini, Large Language Models, Meniscal Tear, Patient Education

Turkish CoHE Thesis Center URL

WoS Q

Q1

Scopus Q

Q1

Source

Knee Surgery Sports Traumatology Arthroscopy

Volume

Issue

Start Page

End Page

Google Scholar Logo
Google Scholar™