Comparing the performance of ChatGPT and Chatsonic on PLAB-style questions: a cross-sectional study
DOI:
https://doi.org/10.18203/2349-3933.ijam20252532Keywords:
Medicine, Data science, Artificial intelligence, Generative AI, ChatGPT, Chatsonic, Multiple choice questionsAbstract
Background: Artificial Intelligence (AI), particularly large language models like ChatGPT and Chatsonic, has garnered significant attention. These models, trained on massive datasets, generate human-like responses. Studies have assessed their performance on professional and licensing examinations, as well as medical examinations, with varying levels of competency. Assessment of ChatGPT and ChatSonic's competence in addressing PLAB-oriented queries.
Method: We conducted an independent cross-sectional study in May 2023 to evaluate the performance of ChatGPT and Chatsonic on the PLAB-1 Exam. The study used 180 multiple-choice questions from a mock test on the 'Pastest' platform and excluded questions with images, tables, or unanswered by AI. The responses of the two AI models, correct answers, and question difficulty statistics were recorded and compared. The performance of the two AI software packages was assessed based on the recorded metrics.
Results: Out of 180 questions, 141 were included and 39 excluded. ChatGPT outperformed Chatsonic, answering 78% of questions correctly compared to the latter's 66%. ChatGPT achieved 85% accuracy in answering easy questions, while Chatsonic performed poorly across all levels, answering 75% of easy questions, 64% of average questions, and only 38% of difficult questions.
Conclusions: ChatGPT outperformed Chatsonic in all dataset categories and showed non-statistically significant superior performance across difficulty levels. Both AI models' accuracy decreased with increasing question difficulty.
Metrics
References
Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F. Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol. 2023;13:1268915. DOI: https://doi.org/10.3389/fonc.2023.1268915
Deoghare S. An interesting conversation with ChatGPT about acne vulgaris. Indian Dermatol Online J. 2024;15(1):137-40. DOI: https://doi.org/10.4103/idoj.idoj_77_23
Pedersen FH. Ownership at OpenAI: From the Perspective of Enterprise Foundation Governance. 2024. DOI: https://doi.org/10.2139/ssrn.4795279
Ahuja M. Analysis and comparison of generative AI chatbot applications. 2024.
Schunder E, Adam P, Higa F, Remer KA, Lorenz U, Bender J, et al. Phospholipase PlaB is a new virulence factor of Legionella pneumophila. International J Med Microbiol. 2010;300(5):313-23. DOI: https://doi.org/10.1016/j.ijmm.2010.01.002
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health. 2023;2(2):568. DOI: https://doi.org/10.1371/journal.pdig.0000198
Schunder E. Untersuchungen zur Funktion und Lokalisation der Phospholipase A/Lysophospholipase A (PlaB) von Legionella pneumophila. 2010.
Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. healthcare (Basel). 2023;11(6):887. DOI: https://doi.org/10.3390/healthcare11060887
Gilson A, Safranek CW, Huang T. How Does ChatGPT Perform on the United States Medical Licensing Examination. The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:45312. DOI: https://doi.org/10.2196/45312
Currie GM. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy. Semin Nucl Med. 2023;53(5):719-30. DOI: https://doi.org/10.1053/j.semnuclmed.2023.04.008
Lo CK. What Is the Impact of ChatGPT on Education. A Rapid Review of the Literature. Education Sci. 2023;13(4):410. DOI: https://doi.org/10.3390/educsci13040410
Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American heart association course. Resuscitation. 2023;185:109732.
Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American heart association course. Resuscitation. 2023;185:109732. DOI: https://doi.org/10.1016/j.resuscitation.2023.109732
Zhu L, Mou W, Yang T, Chen R. ChatGPT can pass the AHA exams: Open-ended questions outperform multiple-choice format. Resuscitation. 2023;188:109783. DOI: https://doi.org/10.1016/j.resuscitation.2023.109783
Al-Shakarchi NJ, Haq IU. ChatGPT performance in the UK medical licensing assessment: how to train the next generation. mayo clinic proceedings: Digital Health. 2023;1(3):56. DOI: https://doi.org/10.1016/j.mcpdig.2023.06.004
McManus IC, Wakeford R. PLAB and UK graduates' performance on MRCP(UK) and MRCGP examinations: data linkage study. BMJ. 2014;348:2621. DOI: https://doi.org/10.1136/bmj.g2621