My perspective aligns with this: I used to obsess over the Best Model, which I defined as "top of benchmarks", which also meant Biggest, Slowest and Most Expensive.
Then I gave two models a Real World Task.
The "Best" model took 3x longer to complete it, and cost 10x more. [0]
Now I define Best Model as "the smallest, fastest, cheapest one that can get the job done". (Currently happy with GLM-4.7 on Cerebras, at least I would be if the unlimited plan wasn't sold out ;)
I later expanded this principle when model speed crossed into the Interactive domain. Speed is not merely a feature; a sufficient difference in speed actually produces a completely new category of usage.
[0] We recently arrived at an approximation of AGI which is "put a lossy solver in an until-done loop". For most tasks we're throwing stuff at a wall to see what sticks, and the smaller models throw faster.
Then I gave two models a Real World Task.
The "Best" model took 3x longer to complete it, and cost 10x more. [0]
Now I define Best Model as "the smallest, fastest, cheapest one that can get the job done". (Currently happy with GLM-4.7 on Cerebras, at least I would be if the unlimited plan wasn't sold out ;)
I later expanded this principle when model speed crossed into the Interactive domain. Speed is not merely a feature; a sufficient difference in speed actually produces a completely new category of usage.
[0] We recently arrived at an approximation of AGI which is "put a lossy solver in an until-done loop". For most tasks we're throwing stuff at a wall to see what sticks, and the smaller models throw faster.