When I first read the WGAN paper a while ago, I only had a vague understanding of what's meant by ``the data distribution may not have a density'', and why this spells trouble for density estimation (and hence motivates likelihood-free learning).
I hope to clarify these and related issues in an accessible note **here**.

## The Ill-defined Problem of Maximum Likelihood Estimation

Oct 28 2021