The brain is a complex dynamic system that constantly evolves. Characterization of the spatiotemporal dynamics of brain activity is fundamental to understand how brain works. Current studies with functional connectivity and linear models are limited by sacrificed temporal resolution and insufficient model capacity. With a generative variational auto encoder (VAE), the present study mapped the high-dimensional transient co-activity patterns (CAPs) of large datasets in a low-dimensional latent space. We demonstrated with multiple datasets that VAE can effectively represent the transient CAPs in latent space, paving the way for frame-wise modeling of the complex spatiotemporal dynamics in future.