More

alwa · 2026-04-06T17:33:21 1775496801

And if, as OP says, it’s necessity and sufficiency we’re testing—whether or not there were also other reasons contributing to your exam failure, wouldn’t failing that one necessary condition be sufficient to fail the outcome?

alwa · 2026-04-05T19:09:53 1775416193

I’d hate for the “government-name” verification to become a requirement, but I’ve long wished services would at least offer that as an optional add-on. For certain important accounts, I’d be eager to place my government identity on file with the company ahead of time.

The Americans have done something kind of interesting along those lines, as far as an in-person IDV option to establish e-government accounts [0]. You start account setup online, then take a barcode to a post office along with your identity documents.

I have to imagine it’s hard to make a commercial case for such a system, though… especially these days with so much momentum toward the approach I resent—that is, requiring ID checks just to be online in the first place.

[0] https://www.login.gov/help/verify-your-identity/verify-your-...

alwa · 2026-04-05T17:20:57 1775409657

And, to put a charitable gloss on it—many of your clients who protest their innocence did actually have something true-positive on them. TFA cites manufacturers claiming a 4% false positive rate in lab conditions, and protesters claiming a 15-18% FPR in field conditions [*]. Offensive and intolerable in the aggregate, yes; but as far as base odds for any specific case in front of me?

Assuming 18% of my defendants truly didn’t have drugs or drug residue on them, that still means 4 out of 5 of them did—and that the extra months of sitting in jail and fighting would lead nowhere for them. What’s 6 months of diversion by comparison?

Even beyond the odds, how many of defendants’ claims are precise enough to implicate test accuracy specifically? “No, man, I swear I don’t do that stuff anymore” might be completely true and still result in a marginal-but-confirmatory lab result for the swab of the ol’ contraband satchel or whatever.

Seems to me that’s the sort of stuff that the plea-negotiation part is for: instead of hoping expensive and slow science will give you certainty (or context), it’s “hey prosecutor, even if the test is right, we’re talking residue not bricks… diversion and call it a day?”

I feel like, from the outside (and fed a steady diet of Law and Order television show), people grossly overestimate the degree of certainty embedded in a criminal conviction. I’m reminded of the classic Kalven and Zeisel study (1967) [1] finding, among other things, that judges and juries who sat through the exact same trial disagreed on guilt or innocence in up to 25% of cases.

False convictions are outrageous, and I understand our individual instincts here to say “go to the mat for the truth!” But people are complicated, context is hazy, and there’s some irreducible residual rate at which any approach to justice gets it wrong. Which adds up to a lot more individuals than we’d like to assume.

One of many reasons not to confuse the person with the bureaucratic record of the person.

[*] Or, as the CNN person writes, a “91% error rate” per an NYCDOI study… guessing because there they were over-applying the tests in situations with extremely low base rates? At first I assumed this was a typical case of journalistic innumeracy, but reading through the study [0], they’re absolutely right and worth a quick aside: 91% really was the actual false positive rate the lab found in actual field tests. Surface swabs in a jail mail interdiction program, only examining the fentanyl tests, but still. It’s a wild read in all sorts of ways: there was a known reaction between the field test and chemicals in paper, but they used it for the mail anyway; the more false positives they saw, the more they panicked that all the mail had fentanyl in it. 89% of all the initial DOC field tests came back positive, 91% of those positives turned out to be false ones. The administration interpreted this to mean that all the inmates’ families were clearly “soaking envelopes in liquid fentanyl,” then mailing them to inmates to “chew.” The misconception led them all the way to trying to ban physical mail in the system.

[0] https://www.nyc.gov/assets/doi/reports/pdf/2024/FieldTesting...

[1] https://www.commentary.org/articles/abraham-goldstein/the-am... and replicated recently https://scholarship.law.cornell.edu/facpub/343/

kennywinker · 2026-04-05T20:20:19 1775420419

I might accept a 0.1% false positive rate - but 4% is unconscionable. Yes, trials are messy, but this isn’t a trial - it’s a bad test.

Tbh, your “people are complicated” stance feels like the position of someone who believes they will never be on the wrong side of situations like these. I hope you never are, but I don’t think it’s a position that is defensible if you’re including people who are in groups likely to be subjected to tests like this (e.g. prison inmates)

alwa · 2026-04-04T03:53:58 1775274838

When you can’t enforce everything at once, you go where the most acute problems are. I imagine when your MCP avenue of abuse catches on—like this other category of harnesses did—to such a scale as to become a problem impacting us folk trying to go about our business… when that’s where the problems shift, I imagine (and hope) Anthropic will crack down on that vector too. To keep the service usable for us ordinary meatbags.

I’m glad they give us the leeway to experiment, and I’m also glad they weed the garden from time to time. To switch metaphors, I’m deeply frustrated when my very modest, commuter-grade use gets run off the figurative highway by figurative hot-rodders. It’s been extra-529y this week, and it’s about time they reined it in a little.

You’re always welcome to pay-as-you-go for as many tokens as you’d like to burn on their infrastructure… or to compute against any of the wide array of ever-improving open models on commodity compute providers…

mech422 · 2026-04-04T04:27:15 1775276835

>>when your MCP avenue of abuse catches on

Thats an interesting way of phrasing it - so is there a way to use the quota that's not 'abuse'? MCP/claude code seems to be want they want you to use it - are loops or ralph abuse as well ?

alwa · 2026-04-04T20:06:03 1775333163

I take your point, the way I used “abuse” there probably carries more charge than I’d meant it to. It’s a totally valid way to use the technology, it’s “abusive” only of the subscription program. And I agree: Anthropic clearly want people to industrialize and automate usage. But that’s not what the subscription product is for. Use all the loops you want, burn all the tokens you want—just pay what they cost.

> is there a way to use the quota that's not 'abuse'?

I think my answer is “no.” In that I’ve never thought of the limits as “quotas,” and I don’t think I’ve heard Anthropic speak of them that way. Quotas are to be used up, while limits are to signal that what you’re doing is outside the envelope of acceptable use. Quotas are to be met, limits are to be avoided.

I interpret the intention of the subscription, like a membership at a makerspace, to be to allow novices to experiment with stuff, to take on personal-scale projects, to allow them to learn without having to understand the tool’s economics upfront. To play without fear of expensive mistakes.

And, like the makerspace, it can only offer generous limits to the extent that most of us rarely bump up against them. If you’re doing production runs in the makerspace, you’re crowding out the other members, and something’s gotta give.

To the extent that we do bump against the limits during “ordinary” use—and we do with Claude Code, especially those of us around here—it’s really frustrating. The limits need to rise in order for it to remain attractive to casual users like me, the economics still need to add up for the subscription program as a whole, and part of that is separating out what patterns of use belong under a different regime.

If these harnesses or OpenClaws or whatever stop making sense as soon as they have to pay their actual costs, then that’s a pretty good sign they’re abusing the spirit of the subscription.

But Anthropic seem more than happy to service those uses via the API or metered usage, and even to sweeten the deal with more reliable access and bulk discounts. I certainly wouldn’t characterize the same automated usage as “abuse” via that channel.

mech422 · 2026-04-04T22:55:26 1775343326

>>I take your point, the way I used “abuse” there probably carries more charge than I’d meant it to.

Fair enough.

>> But that’s not what the subscription product is for.

This was the point I was trying to make - I pay for XX tokens/usage. But somehow using them all is 'taking advantage' ?

BTW - I'm actually not complaining about the limits - I probably only use half my tokens on average week. I'm just annoyed at having to jump thru hoops if I want to try something 'API' oriented. For me, AI is still the new shiny - I try all different sorts of things learning/playing. There was an article posted today about writing agent harnesses. That could be interesting - maybe I want to try my hand at it. But then I've got to mess around/pay extra to _try_ something I that my subscription already easily covers.

[added:] >>to take on personal-scale projects, to allow them to learn without having to understand the tool’s economics upfront. To play without fear of expensive mistakes.

This is exactly what I'm trying to do - however, as soon as you want to try anything 'API' oriented, the 'fear of expensive mistakes' comes right back.

Leynos · 2026-04-04T06:09:21 1775282961

It's not difficult at all to burn through your weekly limit just writing code.

alwa · 2026-03-30T16:41:12 1774888872

It sure changes the incentives though. It’s much less attractive to leak recordings as a PR move—or realize any benefit that cranky humorless judges can trace back to the recording—if that, in and of itself, constitutes a whole new crime (and effectively confessing to it too).

alwa · 2026-03-29T08:18:36 1774772316

Even better if we could somehow trunk my space’s 3500W of panels with the ones covering the combustion-driven car next to me. And the empty space to my other side…

alwa · 2026-03-26T15:57:21 1774540641

If I read it correctly, this line was quoting the main victim, who described it that way (incorrectly, apparently based on a mangled secondhand interpretation of how these things work).

The thing that really stood out to me in the article was how many of the affected people assert confidently wrong understandings of the way the tech works:

> “I still use AI, but very carefully,” he says. “I’ve written in some core rules that cannot be overwritten. It now monitors drift and pays attention to overexcitement. […] It will say: ‘This has activated my core rule set and this conversation must stop.’”

I guess not too far from “the CPU is the machine’s brain, and programming is the same as educating it” or that kind of “ehhhhhhhhhhh…” analogy people use to think about classical computing.

roywiggins · 2026-03-26T18:07:24 1774548444

It doesn't help that LLMs roleplay to pretend to behave how their users think they do. You think it has "core programming"? Well, it will say it does. You think it abides by the Three Laws of Robotics? Ditto

alwa · 2026-03-23T17:39:45 1774287585

Generally, by whether they know what’s going on at the shop. Usually if I’m calling on the phone, it’s for a specific answer that’s not gettable through a computer.

“Hey can you look out and see if Joe’s almost done with the blue Chrysler?” is an easy ask for the phone answerer at my local Joe’s shop (it’s his wife, and as a bonus she’ll also holler at him or his crew to hurry up because @alwa is waiting on it).

Contrast with the grant-funded pharmacy I use. Some management type suggested they could deal with their insane level of overwork by automating away the phones to a hostile and labyrinthine network of IVRs. Oh, it has “AI,” but only to force choices between forks in decisions trees corresponding to questions I didn’t have—and every path still eventually ends in “this voice mailbox is full, goodbye.”

After literal hours of my life trying to wrestle their IVRs into helping—I do sympathize with their workload and don’t want to be a special snowflake—I now drive 30 minutes to ask questions face to face.

In general I’ve maxed out what’s discoverable by automated means before I call. So a call center is both useless and insulting.

gowld · 2026-03-23T17:46:38 1774287998

What are you doing to your car that requires such a close relationship with the repair shop?

alwa · 2026-03-23T18:11:16 1774289476

Responsible (directly or indirectly) for quite a few of them, mostly oldish and wheezy; I’m not myself mechanical; and we use the shop mostly for routine maintenance—rotate the tires every few thousand miles, swap the brake pads, deal with the oil changes/fluids/filters, etc.

Partly as a preventative measure: we trust them. In the rare cases when they find something, it’s real. As a consequence we get ahead of brewing problems.

Plus loyalty, to some extent; we try to throw work their way when we can, even if we probably could handle it ourselves. The relationship between our families goes back a good 60 years by now.

Fully grant that my situation is unlikely to be representative. And no shade toward OP—it sounds like a cool project thoughtfully done, and a real improvement over the status quo for her relative!

ryandrake · 2026-03-23T19:31:45 1774294305

Plus, maybe the customer would prefer to support a business that invests in and employs from the local community, even if it costs a little more. Or they see it as a quality signal. If I call a plumber who outsourced their reception to a call center to save a few bucks, I'm starting to think, "What else is he willing to do to save a few bucks?"

alwa · 2026-03-18T19:15:01 1773861301

> the friction is part of the joy of creation for them

I’d extend that to suggest—based on conversations with the artists in my life, anyway—that for many, the friction along the path from an idea to a work is where the art happens in the first place. That the art happens in the additions and subtractions and judgments the artist makes along the way as they bring the artifact into being. That without that, it’s something closer to manufacturing.

I’m reminded of how we around here grumble at piles of vibe-coded slop, even if they notionally solve the users’ problems at hand. It’s not strictly that “it’s insufficient at satisfying the problem brief,” it’s that it’s missing all the other latent considerations—structure, coherence, legibility, maintainability, determinism, good judgment—that a skilled code craftsperson would have worked in along the way almost without thinking.

Depressing for artists of code itself—liberating for the people whose artistic practice is maybe one level of abstraction up—whose obsession is iterating through “finished” products til they fit just so, til they reflect the high-level intention just right. For whom the code part was always an annoying-but-necessary slog, akin to, as another commenter said, grinding the snails for pigment…

“I dread what it means for the code base at work, but damn if I’m not cranking out every single side project I’d never gotten around to…”

egypturnash · 2026-03-18T20:04:30 1773864270

I am a pro artist and you have it exactly. There are many, many decisions along the way from a rough sketch to a finished work, and making them is a lot of the fun. Part of making these decisions is also turning a lot of your brain off while you draw, and vaguely thinking about where to go once you finish what you've done so far.

Serendipity's part of it too, like I could see the "waterfall teapot" starting with just idly modeling a teapot with no particular goal in mind, then accidentally stretching the mouth too wide, laughing at the result, and deciding to experiment with a bunch of absurdly-wide teapots until arriving at the final result.

alwa · 2026-03-15T23:55:07 1773618907

In the spirit you propose—Did you verify this?

I notice that your comment history is all rapid-fire three-paragraph LLM responses. You do appear knowledgeable and respond quickly, but I've just dumped 10 minutes of my life into your attention in order to verify, parse, and filter through your responses.

I can't tell whether you're a person who thought about something. Therefore, I can't tell whether, for example, https://news.ycombinator.com/item?id=47393311 is an analysis I should take seriously (as I might, if it were spoken from experience) or just Markov-chain, Reddit-trained hypothetical fluff.

How can we increase the friction to presumptively exclude you, but provide accommodation if, for example, you're more comfortable in your native language and using the LLM mainly to bring your English writing to a level consistent with your personal expertise?

godelski · 2026-03-16T00:12:42 1773619962

  > I notice that your comment history is all rapid-fire three-paragraph LLM responses

I looked after you said this and those are all from today, in the last hour. And is a stark change from their (very short) comment history.

In particular these two comments are extremely suspicious[0,1]. I think even if not LLM generated I highlights something likely wrong, which paseante themselves states!

  >> a long, detailed response in Slack implied the person had spent time thinking

There's 2 minutes between these comments, on different threads (I also noticed they did similar things in a few threads as I typed this out). While the timing is reasonable for the amount of words written it does not seem adequate for reading the article and/or other comments. Personally, I find that kind of behavior rude as it enshitifies the social space the rest of us are in[2].

[0] https://news.ycombinator.com/item?id=47392999

[1] https://news.ycombinator.com/item?id=47393012

[2] https://news.ycombinator.com/item?id=47393465