See my notebook here, based on section 3 of the VAE paper.
It's worth comparing VAE with RBM (and MRFs in general). VAE as a directed graphical model doesn't have to deal with intractable partition function, which RBM approximates with MCMC in Contrastive Divergence. Instead, the conditional inference \(p_\theta(z|x)\) required for learning VAE is intractable (which is tractable and efficient for RBM and other undirected models, because of their specially designed graphical structures), and the Reparameterization Trick is used to find an approximate I-projection \(q_\phi(z|x)\). Conceptually VAE is no different from probabilistic PCA (and other similar factor analysis models), except that neural networks are used to approximate the parameters in distributions like \(q_\phi(z|x)\) and \(p_\theta(x|z)\) (instead of simple linear-Gaussian models), and the variational posteriors for the different data samples \(x^{(n)}\) share parameters, like \(q(z^{(n)} | x^{(n)})=q_\phi(z|x^{(n)})\) (which is typically not the case in mean-field variational inference).