More

bentcorner · 2026-03-29T04:15:40 1774757740

Exactly - I'm thinking the bad spatial navigators have a higher probability of washing out of driving and pursue some other career. They may not say "I'm bad at figuring out where I am", but the economics of the job are just a little bit worse for these people.

bentcorner · 2026-03-28T16:13:44 1774714424

I don't think this is where the problem lies. If you kill someone with intent, it's murder. But the whole system needs to prove that you killed someone with intent beyond a reasonable doubt, and a DSL will not help you there.

bentcorner · 2026-03-28T16:08:11 1774714091

This could in theory already happen without any tech, but I suspect since the government is pretty monolithic, any changes in a specific law are all being done by the same set of people.

You might not have merge conflicts but I imagine you could end up with conflicting guidance from two separate pieces of law (e.g., law A says you must wear green on St. Patrick's day, law B outlaws green pajamas).

bentcorner · 2026-03-26T04:56:00 1774500960

I think the right solution is to endow the LLM with just enough permissions to do whatever it was meant to do in the first place.

In the customer service case, it has read access to the customer data who is calling, read access to support docs, write access to creating a ticket, and maybe write access to that customer's account within reason. Nothing else. It cannot search the internet, it cannot run a shell, nothing else whatsoever.

You treat it like you would an entry level person who just started - there is no reason to give the new hire the capability to SMS the entire customer base.

bentcorner · 2026-03-25T01:16:56 1774401416

> People are free and probably do this because it is slow. Alternatives often are not a bad thing.

Alternatives are always good but IMO brew is just not something I interact with all that much and to me it's "good enough". It works and does what I expect, although to be fair maybe I'm on the happy path <shrug>.

bentcorner · 2026-03-25T01:04:44 1774400684

Tik Tok is arguably a Facebook killer.

Roblox is in some ways there, I think Epic thought fortnite could have competed. IMO they made a strategic mistake in shackling their game-as-a-platform to Fortnite. I thought the music fortnite thing looked interesting, but I have negative interest in installing Fortnite.

Call it something else and make it literally the first thing you see on epicgames.com, have it work on mobile, and maybe things would be different today.

(Aside: Roblox wins because I can go from typing in roblox.com into my browser and be playing a game with a friend in under 20s)

bentcorner · 2026-03-25T00:45:31 1774399531

Yeah this one is a classic: https://youtu.be/8OzZxjqKG10

bentcorner · 2026-03-16T14:09:47 1773670187

Makes me wonder if there's a bet you can take on Polymarket that Polymarket will get shut down due to it negatively influencing behavior. The insider trading on that one should get interesting.

amelius · 2026-03-16T15:43:25 1773675805

Will it pay out if it is shut down though?

mr_00ff00 · 2026-03-16T17:10:54 1773681054

1. I believe you can bet on this

2. If it’s only banned in the US, yes it pays out, you just need to get a VPN or go to another country.

Also even it’s banned everywhere, the markets are blockchain contracts so you should be able to access it without the website, which is just the frontend. (this is where my technical expertise breaks down, someone who knows blockchain is a better expert)

bentcorner · 2026-03-13T00:10:27 1773360627

I just say something like "spawn an agent to review your plan" or something to that effect. "Red/green TDD" is apparently the nomenclature: https://simonwillison.net/guides/agentic-engineering-pattern...

I've also found it to be better to ask the LLM to come up with several ideas and then spawn additional agents to evaluate each approach individually.

I think the general problem is that context cuts both ways, and the LLM has no idea what is "important". It's easier to make sure your context doesn't contain pink elephants than it is to tell it to forget about the pink elephants.

collinmanderson · 2026-03-13T16:26:06 1773419166

> "Red/green TDD" is apparently the nomenclature

From your link:

> what "red/green" means: the red phase watches the tests fail, then the green phase confirms that they now pass.

> Every good model understands "red/green TDD" as a shorthand for the much longer "use test driven development, write the tests first, confirm that the tests fail before you implement the change that gets them to pass".

bentcorner · 2026-03-09T12:50:52 1773060652

Unfortunately the only way this changes is if a company writes a just enough unreasonable ToS, and someone violates it in just the right way and the company decides to enforce said ToS, and the user fights back, and this all ends in court.

I'd be surprised if all those stars align anytime soon.