Virginia Bartlett, PhD – Assistant Director, Center for Healthcare Ethics,, Cedars-Sinai Medical Center; Joseph Fanning, PhD – Associate Professor, Center for Biomedical Ethics & Society, and Dept of Biomedical Informatics, Vanderbilt University Medical Center; Paul Ford, PhD – Associate Professor, Neurological Institute, Cleveland Clinic
Associate Professor Vanderbilt University Medical Center Nashville, Tennessee
Abstract: Recent advancements in large language models (LLMs), such as OpenAI's GPT-4, have shown mixed results in accuracy and safety across various medical specialties. While these models may offer benefits beyond narrow medical use-cases, their potential to enhance organizational ethics remains underexplored. Furthermore, political philosopher Michael Sandel questions whether smart machines can outthink humans or if human judgment is indispensable in critical life decisions. This evaluation investigates the application of GPT-4 in clinical bioethics, a field requiring nuanced ethical reasoning and recommendations impacting patient care.
We evaluated GPT-4's ethical analysis and recommendations using three publicly available cases from respected bioethics websites. The output was assessed by four clinical bioethicists from three hospitals using quantitative and qualitative measures. GPT-4 scored an average of 1.78 and 1.67 (on a 5-point scale) for the quality of ethical analysis and recommendations, respectively, with instances of misinformation or harmful content noted in each case. Despite these shortcomings, GPT-4 demonstrated a synoptic grasp of bioethics principles and highlighted overlooked ethical issues, particularly related to justice.
However, the model's eloquence often masked significant deficiencies, including the omission of religious, cultural, and legal considerations, and a lack of patient-centered context. Consequently, GPT-4 should not independently provide ethical guidance but may supplement clinical decision-making and serve as a teaching tool. Further training on state laws, organizational policies, and real-life bioethics cases is recommended to enhance its utility. Collaboration between hospital systems, ethicists, and LLM developers is essential to refine these tools for ethical clinical practice.
Keywords: Large Language Models (LLMs), Artificial Intelligence (AI), Innovative and ethics
Learning Objectives:
After participating in this conference, attendees should be able to:
Evaluate the Ethical Analysis Capabilities of LLMs: Participants will be able to assess the quality of ethical analysis and recommendations produced by large language models like GPT-4.
Identify Key Ethical Considerations in Bioethics Consultations: Participants will understand the limitations of LLMs in addressing cultural, legal, and patient-centered aspects of clinical ethics cases.
Develop Strategies for Integrating LLMs in Clinical Practice: Participants will explore compliant, responsible and practical approaches for using LLMs as supplementary tools in clinical decision-making and bioethics education.