> This intuition only works for gaussian or may be a few other distributions, not for a general p(x)
I don't think that's true. It's that if X is a random variable and φ is a convex function, then
φ(E[X]) ≤ E[φ(X)].
It's not necessary for X to be Gaussian, only that φ is convex.
An intuitive way of thinking about it is if φ is convex then it is cup-shaped. So if I sample two points from X and draw a line φ(x_1) to φ(x_2) then that line will clearly lie above the points in x that are between x_1 and x_2 in the cup right? Jensen's inequality just generalises that to say what if I take all the points from X, then the expectation of φ(X) is going to sit above φ(E[X]). Because E[X] is just going to sit somewhere in the middle of X so φ(E[X]) is going to be down in the middle of the cup so is going to be smaller than (φ(x_1)+φ(x_2)+...φ(x_n))/n, which is E[φ(X)].
That's what I mean. The Jensen inequality applies to any distribution, but the intuition presented in the post is only good for simple distributions and all examples are gaussian/binomial-like. It would be difficult to raise the same points with something multimodal or arbitrary.
I don't think that's true. It's that if X is a random variable and φ is a convex function, then
It's not necessary for X to be Gaussian, only that φ is convex.An intuitive way of thinking about it is if φ is convex then it is cup-shaped. So if I sample two points from X and draw a line φ(x_1) to φ(x_2) then that line will clearly lie above the points in x that are between x_1 and x_2 in the cup right? Jensen's inequality just generalises that to say what if I take all the points from X, then the expectation of φ(X) is going to sit above φ(E[X]). Because E[X] is just going to sit somewhere in the middle of X so φ(E[X]) is going to be down in the middle of the cup so is going to be smaller than (φ(x_1)+φ(x_2)+...φ(x_n))/n, which is E[φ(X)].