Translating acquired sequences to missing ones in multi-contrast MRI protocols can dramatically reduce scan costs. Neural network models devised for this purpose are characteristically trained on paired datasets, which can be difficult to compile. Moreover, these models exclusively rely on convolutional operators with undesirable biases towards feature locality and spatial invariance. Here, we present a cycle-consistent translation model, ResViT, to enable training on unpaired datasets. ResViT combines localization power of convolution operators with contextual sensitivity of transformers. Demonstrations on multi-contrast MRI datasets indicate the superiority of ResViT against state-of-the-art translation models.