Wilfred, please correct your statement that the benchmarks game requires "all the test programs for the same language to be identical".
It isn't true. It wasn't true 4 years ago.
For sure, my preference was to show PyPy programs that also worked with CPython -- that made clear that optimizing for PyPy could make performance worse with CPython and vice versa.
Benchmarks Game programs for the same language are not required to be identical. Wilfred Hughes has been asked to correct that misstatement.
>>"It’s also not clear how representative the test programs are of typical performance of that language."<<
Without sampling programs "in the wild" how could anyone possibly claim that other programs were "representative"?
See http://research.microsoft.com/en-us/projects/jsmeter/