More

steinsgate · 2026-02-09T13:42:32 1770644552

We found something surprising about ARC AGI 2: the benchmark aiming to measure human-like fluid intelligence. Just enabling a stateful Python tool boosts performance across models. We got > 4x performance improvement in GPT OSS 120B (high). The effect continues well into frontier territory (GPT 5.2) with double digit gains.

We aren't sure whether these gains happen because code execution is a stronger form of verification compared to pure CoT or because it encourages qualitatively different thinking patterns.

Another interesting finding: interleaved thinking, the model capability behind these gains, seems fragile at the infra/client layer. Soft failures can make capable models look much worse than they actually are.

steinsgate · on Nov 14, 2019

Just wanted to mention that there are indications that universal constants e.g. fine structure constant, gravitational constant etc. are time varying [0].

[0] https://en.wikipedia.org/wiki/Time-variation_of_fundamental_...

steinsgate · on Nov 20, 2016

Ask HN threads contain a lot of valuable information. So I decided to categorize and organize this information in a GitHub repo. Would appreciate your feedback on the following:

1. Do you find this to be of any value?

2. Any suggestions on what you would like to see in such repositories?

steinsgate · on Nov 17, 2016

Hi everyone. I recently created Ask HN summaries. It is a Medium publication that turns an Ask HN thread into a blog post and an accompanying GitHub repository. I would love to hear your feedback on it.

steinsgate · on Nov 9, 2016

I have been working on a NLP project where I needed to identify different forms of the same word. Typically, this is done by Stemming and Lemmatization. These methods are not accurate, and I needed high accuracy in my project. Since I found no libraries/packages that can do this, I decided to write a Python package myself. It works quite well now. It is also trending in /r/python. Feel free to check it out, I would love to hear your feedback.

steinsgate · on Nov 8, 2016

I have been working on a NLP project where I needed to identify different forms of the same word. Typically, this is done by Stemming and Lemmatization. These methods are not accurate, and I needed high accuracy in my project. Since I found no libraries/packages that can do this, I decided to write a Python package myself. It works quite well now. Feel free to check it out, I would love to hear your feedback.

butterm · on Nov 8, 2016

Thanks for doing this. I wanted to do the same things a few months back. I looked into a lot of dictionary APIs, but as you mention in the repo, they suck at connecting different parts of speeches. It's funny how simple this sounds but how difficult it is to actually do it. Back then, I gave up and went with a Lemmatizer. Will definitely use this.

steinsgate · on Oct 13, 2016

If someone told me that there would come a day when two of the first three HN posts will be about Bob Dylan and Leonard Cohen, I would have dismissed them as high or delusional. Turns out that I would be very wrong.

bertiewhykovich · on Oct 13, 2016

[flagged]

dang · on Oct 13, 2016

Please don't comment like this here.

neom · on Oct 13, 2016

printing this comment and framing it for my office wall.

shitgoose · on Oct 14, 2016

please, send me a copy. now I have to read it.

steinsgate · on Oct 8, 2016

Nice work! You said that you avoided machine learning because labeled data is hard to find. What about unsupervised approaches?

Frankly speaking, I am a bit skeptical about pattern matching algorithms for answering questions. It would help if you showed some kind of stats about your algorithm's performance on a diverse question set. For example, you can scrape simple quiz questions (and answers) from quiz sites [1] and report back on the performance.

[1] http://www.quiz-zone.co.uk/questionsbydifficulty/1/0/answers...

steinsgate · on Aug 25, 2016

What's the value of WhatsApp? It is the scale of the product. I use it because most of my friends are on it. There are many other alternatives to WhatsApp as pointed out by many in this thread, but none of them have nearly the same traction.

The only business model that allows such scale is the ad supported model. If WhatsApp was subscription based, I am sure they could not have achieved this scale. And I wouldn't have found value in it.

I realize that the ad supported model is what indirectly adds so much value to WhatsApp. Therefore, it makes sense to me to support that model if the model is reasonable enough. They have end to end encryption, which means my content is safe from prying eyes. That's already huge. So I really don't mind if they share my number with Facebook.

mercer · on Aug 25, 2016

I think the 'advantage' of WhatsApp is not as big as it seems. I already use Telegram with most people that I communicate with regularly who use WhatsApp for everything else.

Almost every one of those people grumbled at first about having to install another app. But because of how the dominant phone UI's work, it turns out to be almost frictionless.

I barely notice whether I get a message through WhatsApp, Telegram, or Facebook Messenger, because I either tap on the notification - and go right to the app and they all look basically the same - or I reply from the notification itself.

There's one friend who is so used to WhatsApp that he'll generally initiate conversations through that, even though I tend to initiate or further conversations in Telegram. Neither of us ever made a comment about it.

I hope I'm right, which means Facebook's stranglehold on communication and personal data might not be as strong as it appears. But I suspect they'll soon rollout more WeChat-like features such as payment, apps and bots that are a lot stickier.

For example, I've been playing around with a number of Telegram bots, and for the first time since I started using all these different chat apps, I am bothered when I have to use a non-Telegram app, because it doesn't support my bots (ranging from silly gif-search-bots to more useful poll-bots). Payment integration would be even 'stickier'.

Aissen · on Aug 26, 2016

> If WhatsApp was subscription based, I am sure they could not have achieved this scale. And I wouldn't have found value in it.

WhatsApp dropped its $1 annual subscription fee just this year: https://blog.whatsapp.com/615/Making-WhatsApp-free-and-more-...

sumitgt · on Aug 25, 2016

WhatsApp is Ad supported? I don't think so. I believe until last year they were subscription based.

jcfrei · on Aug 25, 2016

They were and sometime last year turned everybody's subscription into a lifetime subscription. I think it's only a matter of time until WhatsApp will show ads. I see more and more content being shared through it - eventually it will rival Facebook.

steinsgate · on Aug 25, 2016

It is a free service as of now (at least for me, initially they said I would have to pay after an year, but this never happened). Technically, they don't even have a business model till now. What they have is scale. I think they are afraid that a subscription model will make users churn and they would end up losing the scale. The only other alternative is an ad supported free model or a freemium model. I am wondering if a freemium model would make sense for WhatsApp.

steinsgate · on Aug 16, 2016

another TL;DR : Making mistakes is a great way of learning. To learn from mistakes, the first step is to identify that you have made an error. The second step is to correct it. When we grow older, we become worse at the first step i.e. identifying the error. Without identification, there can be no correction and no learning.