Comparing the performance of ChatGPT and Chatsonic on PLAB-style questions: a cross-sectional study

Rashmi Prakash; Kritika Pathak; Pooja Manjula; Samia Abdul Moiz; Khawar Tariq Mehmood

doi:10.18203/2349-3933.ijam20252532

Authors

Rashmi Prakash Department of Medicine, Adichunchanagiri Institute of Medical Sciences, B.G. Nagara, Karnataka, India
Kritika Pathak Department of Medicine, HNB Medical Education University, Dehradun, Uttarakhand, India
Pooja Manjula Department of Medicine, Government Vellore Medical College, Vellore, Tamil Nadu, India
Samia Abdul Moiz Department of Medicine, Mahadevappa Rampure Medical College, Kalaburagi, Karnataka, India
Khawar Tariq Mehmood Department of Medicine, Aster Hospital Br of Aster Dm Healthcare FZC, Al Raffa, Dubai, United Arab Emirates

DOI:

https://doi.org/10.18203/2349-3933.ijam20252532

Keywords:

Medicine, Data science, Artificial intelligence, Generative AI, ChatGPT, Chatsonic, Multiple choice questions

Abstract

Background: Artificial Intelligence (AI), particularly large language models like ChatGPT and Chatsonic, has garnered significant attention. These models, trained on massive datasets, generate human-like responses. Studies have assessed their performance on professional and licensing examinations, as well as medical examinations, with varying levels of competency. Assessment of ChatGPT and ChatSonic's competence in addressing PLAB-oriented queries.

Method: We conducted an independent cross-sectional study in May 2023 to evaluate the performance of ChatGPT and Chatsonic on the PLAB-1 Exam. The study used 180 multiple-choice questions from a mock test on the 'Pastest' platform and excluded questions with images, tables, or unanswered by AI. The responses of the two AI models, correct answers, and question difficulty statistics were recorded and compared. The performance of the two AI software packages was assessed based on the recorded metrics.

Results: Out of 180 questions, 141 were included and 39 excluded. ChatGPT outperformed Chatsonic, answering 78% of questions correctly compared to the latter's 66%. ChatGPT achieved 85% accuracy in answering easy questions, while Chatsonic performed poorly across all levels, answering 75% of easy questions, 64% of average questions, and only 38% of difficult questions.

Conclusions: ChatGPT outperformed Chatsonic in all dataset categories and showed non-statistically significant superior performance across difficulty levels. Both AI models' accuracy decreased with increasing question difficulty.

Metrics

Metrics Loading ...

References

Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F. Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol. 2023;13:1268915. DOI: https://doi.org/10.3389/fonc.2023.1268915

Deoghare S. An interesting conversation with ChatGPT about acne vulgaris. Indian Dermatol Online J. 2024;15(1):137-40. DOI: https://doi.org/10.4103/idoj.idoj_77_23

Pedersen FH. Ownership at OpenAI: From the Perspective of Enterprise Foundation Governance. 2024. DOI: https://doi.org/10.2139/ssrn.4795279

Ahuja M. Analysis and comparison of generative AI chatbot applications. 2024.

Schunder E, Adam P, Higa F, Remer KA, Lorenz U, Bender J, et al. Phospholipase PlaB is a new virulence factor of Legionella pneumophila. International J Med Microbiol. 2010;300(5):313-23. DOI: https://doi.org/10.1016/j.ijmm.2010.01.002

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health. 2023;2(2):568. DOI: https://doi.org/10.1371/journal.pdig.0000198

Schunder E. Untersuchungen zur Funktion und Lokalisation der Phospholipase A/Lysophospholipase A (PlaB) von Legionella pneumophila. 2010.

Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. healthcare (Basel). 2023;11(6):887. DOI: https://doi.org/10.3390/healthcare11060887

Gilson A, Safranek CW, Huang T. How Does ChatGPT Perform on the United States Medical Licensing Examination. The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:45312. DOI: https://doi.org/10.2196/45312

Currie GM. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy. Semin Nucl Med. 2023;53(5):719-30. DOI: https://doi.org/10.1053/j.semnuclmed.2023.04.008

Lo CK. What Is the Impact of ChatGPT on Education. A Rapid Review of the Literature. Education Sci. 2023;13(4):410. DOI: https://doi.org/10.3390/educsci13040410

Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American heart association course. Resuscitation. 2023;185:109732.

Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American heart association course. Resuscitation. 2023;185:109732. DOI: https://doi.org/10.1016/j.resuscitation.2023.109732

Zhu L, Mou W, Yang T, Chen R. ChatGPT can pass the AHA exams: Open-ended questions outperform multiple-choice format. Resuscitation. 2023;188:109783. DOI: https://doi.org/10.1016/j.resuscitation.2023.109783

Al-Shakarchi NJ, Haq IU. ChatGPT performance in the UK medical licensing assessment: how to train the next generation. mayo clinic proceedings: Digital Health. 2023;1(3):56. DOI: https://doi.org/10.1016/j.mcpdig.2023.06.004

McManus IC, Wakeford R. PLAB and UK graduates' performance on MRCP(UK) and MRCGP examinations: data linkage study. BMJ. 2014;348:2621. DOI: https://doi.org/10.1136/bmj.g2621