When I first read the WGAN paper a while ago, I only had a vague understanding of what's meant by ``the data distribution may not have a density'', and why this spells trouble for density estimation (and hence motivates likelihood-free learning). I hope to clarify these and related issues in an accessible note here.
The Ill-defined Problem of Maximum Likelihood Estimation
Oct 28 2021