Very cool project! Reminds me Chiang's great short story 'The Truth of Fact, the Truth of Feeling':
> “If you speak slowly, you pause very briefly after each word. Thatʼs why we leave a space in those places when we write. Like this: How. Many. Years. Old. Are. You?” He wrote on his paper as he spoke, leaving a space every time he paused: Anyom a ou kuma a me?
> “But you speak slowly because youʼre a foreigner. Iʼm Tiv, so I donʼt pause when I speak. Shouldnʼt my writing be the same?”
Phrasal verbs are listed under the main verb. I never ever had a problem with that. As a native speaker sometimes I still have to search for some in some strange context.
These particular examples are figures of speech, so "shut" in "shut up" still means the same thing it would mean in "shut the door." And "up" is used the same way as "cover up."
So the issue is just that this is figurative language, and you have to know that a kickoff is the beginning of certain sports, for example. It's more of a cultural issue than something a dictionary needs to fix.
They don't get into enough learning lists, and from my perspective, they are great additions to word games because the more transparent compounds are unique and legit words that can more than double the accessible vocabulary.
I'm curious, why did you make this change? The article isn't about Oregon. Plus, the claim itself is pretty disingenuous since it is including covid data, vastly exaggerating the effect[0]
I quoted from the middle of the article what I found interesting. I don't submit on here much and I just put that in the title field without thinking. I'm glad that comment provides proper context on this.
Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.
Maybe somewhere in the original comment it would have been fair to mention you can barely see the house in the original photo. This is actually a hilarious complaint
That cannot be a valid excuse. Other than adding extra windows to the clearly visible wall, it's obvious that model perfectly capable to "see" the house. It just cannot "believe" that there can be a big empty wall on a garden house.
It leads on arc-agi-1 with Gemini 3.0 Deep Think, which uses "tool calls" according to google's post, whereas regular Gemini 3.0 Pro doesn't use "tool calls" for the same benchmark. I am unsure how significant this difference is.
Very interesting to hear two technologists at a tech business conference say things along the lines of: "our tools do not merely extend us, they transform us", followed up with "we've become numb to the devastating consequences of technology".
(I know I'm somewhat selectively reading but still)
Interesting because games are exactly the kinds of RL environments that models can effectively learn - but the catch is that they must do this learning on the fly in test-time. Very exciting to see this.
> “If you speak slowly, you pause very briefly after each word. Thatʼs why we leave a space in those places when we write. Like this: How. Many. Years. Old. Are. You?” He wrote on his paper as he spoke, leaving a space every time he paused: Anyom a ou kuma a me?
> “But you speak slowly because youʼre a foreigner. Iʼm Tiv, so I donʼt pause when I speak. Shouldnʼt my writing be the same?”
reply