One recurrent problem for applying deep learning models in medical imaging is the reduced availability of labelled training data. A common approach is therefore to focus on image patches rather than whole volumes, thus increasing the number of samples. However, for many diseases anomalous patches (positive samples) are outnumbered by negative patches showing no anomaly. Here, we explore different strategies for negative sampling in the context of brain aneurysm detection. We show that classification performances can vary drastically with respect to negative sampling, and that real-world disease or anomaly prevalence can further degrade performance estimates.