Keywords: Language Models, AI/ML Image Reconstruction, Image Synthesis
Motivation: We aim to introduce a foundation model based on visual and textual inputs to enable robust, unified image synthesis in multimodal MRI.
Goal(s): Our goal is to demonstrate a versatile foundation model, with language guidance for accurate target descriptions, that adapts easily to new modalities and datasets, using computationally efficient fine-tuning strategies with minimal additional data and training.
Approach: Our approach conditions synthesis on source-modality images and target-modality text descriptions, via a text encoder to embed textual inputs, one-step latent diffusion model to perform fast synthesis, and low-rank adaptation for efficient fine-tuning.
Results: We demonstrated high-quality synthesis performance over various modalities and datasets.
Impact: Conventional synthesis models rely on image-to-image translation with just visual inputs and often show limited generalizability. We demonstrate a foundation model with language guidance that leverages textual inputs for improved adaptability to new modalities.
How to access this content:
For one year after publication, abstracts and videos are only open to registrants of this annual meeting. Registrants should use their existing login information. Non-registrant access can be purchased via the ISMRM E-Library.
After one year, current ISMRM & ISMRT members get free access to both the abstracts and videos. Non-members and non-registrants must purchase access via the ISMRM E-Library.
After two years, the meeting proceedings (abstracts) are opened to the public and require no login information. Videos remain behind password for access by members, registrants and E-Library customers.
Keywords