Keywords: Other AI/ML, Segmentation
Motivation: Vision Language Segmentation Models (VLSMs) have excelled in medical image segmentation by leveraging both image and text data, improving clinical support. However, limited access to fully paired image-report-mask datasets restricts their effectiveness.
Goal(s): We aim to boost VLSM performance by converting weakly paired datasets with limited reports into fully paired datasets using pseudo-report generation without additional training.
Approach: Using a Cross-modal Self-Retriever, we generated pseudo-reports with a pre-trained Vision-Language Model and trained VLSMs with these reports and images.
Results: Our method notably improved segmentation performance, outperforming image-only models on the DSC metric with as little as 10% of data containing reports.
Impact: Our pseudo-report generation approach maximizes VLSM potential in report-limited environments without additional training, enhancing efficiency. Notably, with only 10% of reports available, it outperforms image-only models and more effectively reduces false positives, providing practical clinical benefits.
How to access this content:
For one year after publication, abstracts and videos are only open to registrants of this annual meeting. Registrants should use their existing login information. Non-registrant access can be purchased via the ISMRM E-Library.
After one year, current ISMRM & ISMRT members get free access to both the abstracts and videos. Non-members and non-registrants must purchase access via the ISMRM E-Library.
After two years, the meeting proceedings (abstracts) are opened to the public and require no login information. Videos remain behind password for access by members, registrants and E-Library customers.
Keywords