Keywords: Analysis/Processing, AI/ML Software, brain decoding, fMRI
Motivation: Understanding how the human brain decodes complex video stimuli can improve our insights into multisensory integration and advance brain-computer interfaces.
Goal(s): This study aims to enhance video decoding from fMRI data by integrating visual, textual, and audio streams, hypothesizing that each stream independently contributes to brain representation.
Approach: Using the BOLDMoments dataset of video-fMRI pairs, we developed cross-subject decoding models able to retrieved seen videos from fMRI activity, estimating multimodal embeddings from pre-trained models (XCLIP, CLAP) directly from preprocessed fMRI data.
Results: Multimodal integration (Video+Text+Audio) achieved the highest retrieval and identification accuracy, underscoring the advantage of combined sensory information for decoding.
Impact: This work opens pathways for more accurate brain decoding in multisensory contexts, potentially advancing brain-computer interfaces and aiding clinical applications in sensory processing disorders.
How to access this content:
For one year after publication, abstracts and videos are only open to registrants of this annual meeting. Registrants should use their existing login information. Non-registrant access can be purchased via the ISMRM E-Library.
After one year, current ISMRM & ISMRT members get free access to both the abstracts and videos. Non-members and non-registrants must purchase access via the ISMRM E-Library.
After two years, the meeting proceedings (abstracts) are opened to the public and require no login information. Videos remain behind password for access by members, registrants and E-Library customers.
Keywords