Keywords: Language Models, Language Models
Motivation: The potential of large language models (LLMs) in automating complex medical tasks, such as TNM staging from breast cancer DCE-MRI reports, remains unexplored.
Goal(s): To evaluate and compare the effectiveness of ChatGPT 4.0, ChatGPT 3.5, and Google Bard in automating TNM staging using zero-shot and few-shot learning approaches.
Approach: We analyzed 745 DCE-MRI reports using different LLMs and learning strategies, assessing intra- and inter-LLM agreement, accuracy, and AUC.
Results: ChatGPT 4.0 demonstrated superior performance (AUC: 0.89 in few-shot learning) compared to other models. Few-shot learning significantly improved all models' performance, with Bard showing the largest improvement (14.8 percentage points increase in AUC).
Impact: This study demonstrates the potential of LLMs, especially ChatGPT 4.0, in automating breast cancer TNM staging from DCE-MRI reports. The effectiveness of few-shot learning suggests a pathway for rapid adaptation of AI in radiology, potentially enhancing diagnostic efficiency and accuracy.
How to access this content:
For one year after publication, abstracts and videos are only open to registrants of this annual meeting. Registrants should use their existing login information. Non-registrant access can be purchased via the ISMRM E-Library.
After one year, current ISMRM & ISMRT members get free access to both the abstracts and videos. Non-members and non-registrants must purchase access via the ISMRM E-Library.
After two years, the meeting proceedings (abstracts) are opened to the public and require no login information. Videos remain behind password for access by members, registrants and E-Library customers.
Keywords