I think the film would have been better (though perhaps less successful) if Besson had toned down the occasionally exaggerated tomfoolery, like Chris Tucker's character, or the spaceship Evil (the orb described in the article) which felt almost like a SciFi parody taken out of the movie Spaceballs.
The pacing, the great costumes and set design by Moebius, the actors Bruce Willis and Milla Jovovich, and the unusual ideas (like the alien opera singer) were all more than enough to carry the movie.
> D(P||Q) = measure of how much our model Q differs from the true distribution P. In other words, we care about how much P and Q differ from each other in the world where P is true, which explains why KL-div is not symmetric.
I don't think this particular interpretation actually makes sense or would explain why KL divergence is not symmetric.
First of all, the "difference" between P and Q would be the same independently of whether P, Q, or some other distribution is the "true" distribution.
For example, assume we have a coin and P(Heads)=0.4 and Q(Heads)=0.6. Now the difference between the two distributions is clearly the same irrespective of whether P, Q or neither is "true". So this interpretation doesn't explain why the KL divergence is asymmetric.
Second, there are plausible cases where it arguably doesn't even make sense to speak of a "true" distribution in the first place.
For example, consider the probability that there was once life on Mars. Assume P(Life)=0.4 and Q(Life)=0.6. What would it even mean for P to be "true"? P and Q could simply represent the subjective beliefs of two different people, without any requirement of assuming that one of these probabilities could be "correct".
Clearly the KL divergence can still be calculated and presumably sensibly interpreted even in the subjective case. But the interpretations in this article don't help us here since they require objective probabilities where one distribution is the "true" one.
> First of all, the "difference" between P and Q would be the same independently of whether P, Q, or some other distribution is the "true" distribution.
I don't think this is the case in general because in D_{KL}(P||Q) the model is weighting the log probability ratio by P(x) whereas in D_{KL}(Q||P) it's weighting by Q(x).
So let's think it through with an example. Say P is the true probability of frequencies of English words and Q is the output of a model that's attempting to estimate this.
Say the model overestimates the frequency of some uncommon word (eg "ancillary"). D_{KL}(P||Q) weights by P(x), the actual frequency, so the divergence will be small, but since the model thinks the frequency of that word is high, when we take D_{KL}(Q||P) it weights by Q(x), the model estimated frequency, so it will weight that error highly and D_{KL}(Q||P) will be large.
That's why it's not symmetric - it's weighting by the first distribution so the "direction" of the error matters.
You misunderstood what I was saying. I was not suggesting that the KL divergence is symmetric. I was saying that it would be symmetric (and independent of the "truth" of a distribution) if it was interpreted as the quoted measure of "difference" between two distributions. So that proposed interpretation is wrong.
To the first point, I think that the KL divergence is indeed symmetric in this case, 0.4 * ln(0.4 / 0.6) + 0.6 * ln(0.6 / 0.4) no matter which direction you go.
Still, there's no avoiding the inherent asymmetry in KL divergence. To my mind, the best we can do is to say that from P's perspective, this is how weird the distribution Q looks.
> To the first point, I think that the KL divergence is indeed symmetric in this case, 0.4 * ln(0.4 / 0.6) + 0.6 * ln(0.6 / 0.4) no matter which direction you go.
But my argument also works for any other probability distribution, e.g. P(heads)=0.5 vs Q(heads)=0.99.
> Still, there's no avoiding the inherent asymmetry in KL divergence.
I wasn't suggesting otherwise, I was talking about his interpretation.
Unfortunately all these intuitions rely on a distinction between a "true" distribution P and a "false" distribution Q. So they don't work for a subjective probability interpretation where it doesn't make sense to speak of a true or false distribution.
The math doesn't need a 'true' or 'false' distribution; that just falls out of the use of a model ('false') to approximate reality ('true'). When the bard says "there are more things in heaven and earth, Horatio, than are dreamt of in your philosophy," he's also saying that the KL Divergence between Horatio's beliefs and reality is infinite.
We can also apply the concept between two subjective distributions. If I'm indifferent to sports teams (very broad distribution) and you're a rabid fan of A (sharp, narrow distribution), then it might take you a long time to express a point in a way I'll understand – but conversely I might be able to express "team B is good actually" in a way that just does not compute for you.
It will be more relevant in the future, but it's still worth thinking about. Right now intermittent energy sources cover around 15-18% of the total energy consumption in Germany[1]. And seasonal variability is covered by other methods (natural gas and others).
But since 2/3rd of the fossil energy is wasted, it's more like 40% of the useful energy.
Some people don't know about the primary energy fallacy, others know about it and try to exploit it, so you should be suspicious of the opinions of anyone trying to use it suggest lack of progress and futility.
If you measure useful energy as electricity output of a fossil fuel plant then yes. But in many cases the waste heat is used in other applications for example district heating or low grade industrial heat.
If you use fossil fuel to directly drive an industrial process, for example melting of ores/metals/glass then the efficiency is much higher.
Electricity can still be more efficient for many of these with heat pumps, like indoor heating and steam production. The gap is smaller then for working engines of cause.
And Germany right now have battery storage equivalent of fully powering Germany for about 30 minute sand raising up every month, which is quite wild.. https://battery-charts.de/
With those feeding on negative-priced electricity, intermittent sources will only get more economical to the detriment of gas and nuclear.
Every hour you don't run your nuclear power plant at full capacity you lose money. Nuclear power is mostly capex. You need to maximize utilization if you want to be profitable.
It's far worse not to have sufficient electricity during the night or on overcast days. You can just increase nuclear electricity prices during that time to make up for the lost revenue from sunny days.
I don't think there's strong evidence of this being an ad. I was surprised to see the Intel Arc A770, a GPU I've never heard of, included on this list. I think it's just that Nvidia has been the dominant force in consumer-level GPUs for a while now.
> I don't think there's strong evidence of this being an ad.
There is strong evidence. Click on the link above. It was posted by a viral marketing company. They even feature the GPU story on their website: https://sheets.works/data-viz
> I was surprised to see the Intel Arc A770, a GPU I've never heard of, included on this list.
Yes, because otherwise the ad would be too obvious.
27 unique levels. 40KB minus a handful of spare bytes and some unused code. The max the NES can support without mappers. Modern NES homebrew and demoscene can do fancier stuff with this budget given the extra decades of learned tricks, but for the state of console gaming in 1985, SMB1 is damn impressive.
Also remember all of that was ROM, the NES had a mere 2 kilobytes of RAM for all your variables and buffers.
reply