The Ill-defined Problem of Maximum Likelihood Estimation

Illusting a simple data distribution in R^2 that does not have a density.

When I first read the WGAN paper a while ago, I only had a vague understanding of what's meant by ``the data distribution may not have a density'', and why this spells trouble for density estimation (and hence motivates likelihood-free learning). I hope to clarify these and related issues in an accessible note here.