Hacker Newsnew | past | comments | ask | show | jobs | submit | TrainedMonkey's commentslogin

Will need to wait for real benchmarks, but based on OpenAI marketing Instant is their latency optimized offering. For voice interface, you don't actually need high tok/s because speech is slow, time to first token matters much more.

Apple seems to be pushing for accessibility and volume. Cheaper phones, mac minis, and entry point mac that will be introduced on Wednesday.

Tbh, feels like the market is pushing for that and Apple is responding.

Exceptionalism says we have best of everything, including idiots.

What if, and hear me out here, "You don't have to"

No, what he is saying is that benchmarks are static and there is tremendous reputational and financial pressure to make benchmark number go up. So you add specific problems to training data... The result is that the model is smarter, but the benchmarks overstate the progress. Sure there are problem sets designed to be secret, but keeping secrets is hard given the fraction of planetary resources we are dedicating to making the AI numbers go up.

I have two of my own comments to add to that. First one is that there is problem alignment at play. Specifically - the benchmarks are mostly self-contained problems with well defined solutions and specific prompt language, humans tasks are open ended with messy prompts and much steerage. Second is that it would be interesting to test older models on brand new benchmarks to see how those compare.


> No, what he is saying is that benchmarks are static and there is tremendous reputational and financial pressure to make benchmark number go up.

That's a much better way to say it than I did.

These models are known for being open weights but they're still products that Alibaba Cloud wants is trying to sell. They have Product Managers and PR and marketing people under pressure to get people using them.

This Venture Beat article is basically a PR piece for the models and Alibaba Cloud hosting. The pricing table is right in the article.

It's cool that they release the models for us to use, but don't think they're operating entirely altruistically. They're playing a business game just like everyone else.


There should be a way to turn the questions we ask LLMs into benchmarks.

That way, we can have a benchmark that is always up to date.


There are a few “updating” benchmarks out there. I periodically take a look at these two:

https://swe-rebench.com/

https://livebench.ai/


You can chain normalized quaternions to combine or diff transformations. For example you can subtract desired attitude quaternion from predicted attitude quaternion to get attitude error quaternion which you can then feed to control algs designed for driving that error to zero. This is even more important when multiple frames of reference are involved as quaternions can be used to transform between them.


Sure, and there are a ton of ways to shifting income around. For example selling a subsidiary in lower tax jurisdiction patents and then paying for their usage. Another example is Hollywood accounting where productions pay exorbitant rates for equipment and catering to affiliated companies. This inflates the costs so the movies end up unprofitable despite smashing box office.

Huh, that is kind of amazing. It turns out the problem is that we got way too good at scaling semiconductor density.

It could be as simple as links. People drop links in the slack discussions, other people from Geolocated IP addresses (or same) click on them. Google analytics et. al. hovers a lot of data.


Or perchance it is the other way around. The word started as official term and over time got shady connotation because can't trust Big Government.


As in "schematic"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: