Silke Maria Lechner1,2, Daniel Butnaru2, Hans-Joachim Bungartz2, Dong Chen1,3, Mika W. Vogel1
1Advanced Medical Applications Laboratory, GE Global Research, Munich, Bavaria, Germany; 2Department of Scientific Computing in Computer Science, Technical University Munich, Munich, Bavaria, Germany; 3Department of Scientific Computations, Technical University Munich, Munich, Bavaria, Germany
We present a fast and flexible Bloch solver implementation on a graphics-processing unit (GPU). Optimization techniques with improved memory allocation, controlled data copying and efficient task utilization yield speed-up times that are 6-12 times faster than conventional realizations or 2-6 times faster than other parallelized approaches. We present existing simulation optimizations and introduce the basic concept of our GPU realization. We illustrate the improved performance and benefit of the chosen optimization techniques based on simulated image acquisition with emphasis on performance and accuracy.