Vision transformers were used to predict total knee replacement within 9 years from magnetic resonance images. Inspired by MRNet, 2D slices of an MR image were encoded by a vision transformer and these encodings were aggregated to provide a single prediction outcome from a 3D MR volume. Our results suggest that the prediction performance of vision transformers was comparable with the models based on convolutional neural networks for the outcome prediction task. Moreover, training models with stochastic gradient descent optimizer provided a better performance compared with the Adam optimizer.
This abstract and the presentation materials are available to members only; a login is required.