The volume of specific brain structures is of clinical interest in many brain diseases. By using a volumetric reference range for healthy subjects, radiologists can contribute to refining diagnosis. However, both scanner and subject characteristics impact the construction and use of these reference ranges. Using a diverse dataset with 80 MRI scanners and 302 subjects, we show Alzheimer’s disease detection from hippocampal volume is robust to mismatch between training (development) and testing (deployment) environments despite showing some influence, but that estimates of atrophy rates can vary considerably depending on the training set used. Radiologists should interpret volumetry statistical results accordingly.