3D imaging sequences such as GRASE or RARE-SoSP are the preferable choice for acquiring ASL images. However, a tradeoff between the number of segments and blurring in the images due to the T2 decay has to be chosen. In this study we propose a reconstruction algorithm based on total generalized variation for reducing the number of segments and therefore the acquisition time of one image. We incorporate the averaging procedure in the reconstruction process instead of reconstructing each image individually. This allows exploiting temporal redundancy and spatial similarity for improving the reconstruction quality of ASL images.