Hacker Newsnew | past | comments | ask | show | jobs | submit | luma's commentslogin

Entirely AI. I just can't with this style anymore.

Brother-in-law graduated med school in the early 90s and has been a practicing ER physician since. We discussed this recently and he related that his advisors told him not to go into radiology back in the late 80s because the assumption was that computers were going to take over the field. He's not too far away from retirement and it's only now that we're starting to see some signs of this prediction from 30+ years ago.

As others in the thread note, there are plenty of concerns around operational use of AI solutions in the medical space, but radiology has a much larger target painted on it than other practices as a fair portion of the job (but certainly not all!) can boil down to high-skill pattern recognition from visual inputs. The current list of AI-enabled devices going through FDA approval is public, more than 3/4 of the list are targeting radiology use cases: https://www.fda.gov/medical-devices/software-medical-device-...


The issue with radiologists is that on average they are able to spot ~35% of correct diagnoses, while the world's best radiologists ~45%. AI might get us to ~50% which is ~15% better than an average radiologist (who still needs to review it).

And you are going to provide the references that will sustain this opinion, so we can elevate it to a fact...

Its fine to ask for sources. It's also fine to not give sources when relaying information in freeform comments. It's not fine to ask for sources in the tone you are using though, as though you are annoyed and simply expect sources to always be included with claims. There are better ways of accomplishing your goals.

Someone drops very specific percentages about diagnostic accuracy....numbers that, if true, have serious implications for patient outcomes, and your concern is that I did not ask nicely enough for a source? I could not think of a more HN typical response...

I did not even call the claim false, even if it almost deserve it...I said, essentially ...let's see the references so we can treat this as fact rather than opinion.

What you did is write a longer and more prescriptive comment about my tone than anything anyone has written about the actual substance :-)). You tone policed a one line request for evidence while giving a complete pass to unsourced medical statistics presented as fact.

If we are ranking things that erode discourse quality, I would say you are higher on the list.


[flagged]


> Calm down

You had a point until you did that.


Nah there's no reason to just accept someone's outbursts and not call them out for unsolicited high emotion lol

You can’t complain about somebody’s tone/call them passive aggressive and then use intentionally inflammatory language like “calm down.”

Three comments in... and you still have not said a single word about whether radiologists actually catch 35% of diagnoses. But you have found time to call me passive aggressive, entitled, lazy, and immature. For one sentence. Asking for a source...

You are now, multiple comments deep, doing the thing you accuse me of...being more invested in tone than substance.

The irony is genuinely impressive at this point.


If you look at early stage diseases it's probably even way less than 35%...

[flagged]


The lack of self awareness you display is impressive. Grade A troll or bot. As someone who sometimes misses things, I find it mildly interesting when someone is so confidently not on the same page as others. Good luck.

Very unspecific. Zero value comment

If you give specific numbers then I expect sources. If you give out incredibly bold claims then I also expect sources.

It's one thing to talk casually, in which case I agree with you. But as soon as hard numbers are on the table, it's no longer casual, and if you do not provide sources then the assumption has to be that you pulled the numbers out of your ass and you are not to be trusted.

To get around that, just don't provide numbers and don't speak authoritatively. It's very easy, I don't know why people speak authoritatively if they know they can't back it up.


The earth is 21,000 miles in circumference

"SOURCE?"

There's a middle ground here that is a grey area that you seem to be pretending is obviously navigated. You're speaking pretty authoritatively on this by the way. Do you have the moral, propositional logic, and epistemological justification for these claims?


I’m not sure I find this to be a comparable example.

If someone was making an important calculation or decision based on the circumference of the earth, then they would likely want the number cited/confirmed and not just thrown out by a random person that doesn’t pass the smell test. “Radiologists are only right 35% of the time” does not pass the smell test and a cursory search makes the case even worse.


I didn't make any claims, all of that is my opinion. There's literally no claims there. I just said that people who spew out numbers then can't provide a source aren't trustworthy - that's an opinion.

And there's obviously a difference between an established and obvious fact and a BOLD claim. This person made a BOLD claim. And provided numbers. To me, that requires a source.

Yes, there is a middle ground, but this isn't in the middle ground. I think this type of claim requires a source. A different claim, without specific percentages, would not. Or an obvious claim, like the Earth's circumference, also would not.


Maybe radiologist mean something different in my country, but here radiologist don't diagnose (i mean, except you see them for a broken bone or something), oncologist do. I did an observation internship with a radiologist when i was 20 (95% of my family are doctor/nurses/PT, i wanted to know what a degree in physics could help me do in the field, and radiologist was the only path to medecine from my initial formation where i only lost a year, and not two). You spend your time calculating doses, finding patient history, and calibrating machines, it's much more a technician role than a MD. In any case, and even if in the US radiologist diagnose cancer, that's such a small part of their job it shouldn't matter.

^ Knowing this, I would believe the best course of action for a hospital administrator would be to implement a "blind workflow" to reduce risk & lawsuits.

A radiologist should separately review a scan, an AI separately review it, and then combine the 2 results for review.


I have seen very conflicting data on this. You shouldn’t state it so confidently.

I assume the numbers are made up as an example.

I worry that rational takes like this end up completely lost in the battle between motivated parties who yell far louder, but have minimal investment in actual outcomes for those who will be depending on these technologies. The debate over self-driving vehicles is another example.


Where are you getting these numbers? Even a cursory search doesn’t put the numbers anywhere near such poor performance by real people.

AI at 50% would be notably worse (also where are you getting that number?)


From radiologist AI training datasets, evaluated long-term/post-mortem.

Sauce or gtfo

I hate to be “source?” about it but your numbers are so far off what every search result is showing.

I am not saying those are for all diagnoses, but for some tricky yet important ones (i.e. detecting them early might save your life).

You did not give specificity of any kind until now, and now I’m even more curious where these numbers are coming from.

Some data (average radiologist score):

Early-Stage Lung Cancer (via Chest X-ray) 33.3%

Clinical Staging of Stage I Pancreatic Cancer (via CT, MRI, EUS) 21.6%

Breast Cancer (via Mammography in Dense Tissue) 30%

Cuneiform fractures (foot, X-Ray) 0%

Midfoot fractures (general, X-Ray) 12.5%

Cuboid fractures (X-Ray) 14.29%

Navicular fractures (X-Ray) 22.22%

Talus fractures (X-Ray) 21.43%

Individual radiologists often scored 5% in those as well. The skill distribution is brutal.


If your original argument was “it could be useful for more difficult/niche observations” then I think most of us wouldn’t have objected.

I also really don’t understand why you still aren’t sharing any links. Is this all LLM-generated without citations or something? Where are you getting your numbers?


Persuade someone to run a prospective trial and show the outcomes. Everything else is bullshit

You’re mixing up “using” with “copying”. You are allowed to “use” all of a book or movie or code by listening to or watching or reviewing the whole thing. Copyright protects copies. The legal claim here is than training an LLM is sufficiently transformative such that it cannot be construed as a copy.

I replied to someone saying that it’s fair use, which presupposes that it’s a derivative work.

Amen.

People seem to struggle with the concept of private datacenters these days. Palantir customers tend to be the sorts of orgs that are pretty paranoid about their data, and they wouldn't be handing it over to some schmucks without being confident that those concerns were addressed. Militaries and governments generally aren't fuckin around with things like intelligence data, so I think it's reasonable that Palantir is able to make a convincing case to the world's most paranoid orgs that their data isn't being sent anywhere (and it'd likely be air gapped anyway).

Just because everything you touch is in the cloud doesn't mean other orgs aren't still building their own datacenters and then buying software to run inside.


I'm not sure if you're familiar with the work from the lab of Mike Levin at Tufts but I'm betting you'll find it interesting if not. Here's a taste https://pmc.ncbi.nlm.nih.gov/articles/PMC6923654/

While I disagree with your notion that this is explicity due to gravity, the rest of your argument seems to align with some of this lab's work. Learning can be demonstrated on scales as low as a few molecules, way below what we would normally call "life".


And then an 18-to-20-something-year training run is required for each individual instance.

I know right, such a waste. Plus it's so random on how they will turn out!

Any suggestions on how to reduce that waste?


Modern LLMs, just like everyone reading this, will instead reach for a calculator to perform such tasks. I can't do that in my head either, but a python script can so that's what any tool-using LLM will (and should) do.

This is special pleading.

Long multiplication is a trivial form of reasoning that is taught at elementary level. Furthermore, the LLM isn't doing things "in its head" - the headline feature of GPT LLMs is attention across all previous tokens, all of its "thoughts" are on paper. That was Opus with extended reasoning, it had all the opportunity to get it right, but didn't. There are people who can quickly multiply such numbers in their head (I am not one of them).

LLMs don't reason.


I tried this with Claude - it has to be explicitly instructed to not make an external tool call, and it can get the right answer if asked to show its work long-form.

Mathematics is not the only kind of reasoning, so your conclusion is false. The human brain also has compartments for different types of activities. Why shouldn't an AI be able to use tools to augment its intelligence?

I used the mathematics example only because the GP did. There are many other examples of non-reasoning, including some papers (as recent as Feb).

There are many examples of current limitations, but do you see a reason to think they are fundamental limitations? (I'm not saying they aren't, I'm curious what the evidence is for that.)

It's because of how transformers work, especially the fact that the output layer is a bunch of weights which we quite literally do a weighted random choice from. My hunch is that diffusion models would have a higher chance of doing real reasoning - or something like a latent space for reasoning.

Thinking that LLMs are intelligent arises from an incomplete understanding of how they work or, alternatively, having shareholders to keep happy.


Furthermore, the LLM isn't doing things "in its head" - the headline feature of GPT LLMs is attention across all previous tokens, all of its "thoughts" are on paper

LOL, talk about special pleading. Whatever it takes to reshape the argument into one you can win, I guess...

LLMs don't reason.

Let's see you do that multiplication in your head. Then, when you fail, we'll conclude you don't reason. Sound fair?


I can do it with a scratch pad. And I can also tell you when the calculation exceeds what I can do in my head and when I need a scratch pad. I can also check a long multiplication answer in my head (casting 9s, last digit etc.) and tell if there’s a mistake.

The LLMs also have access to a scratch pad. And importantly don’t know when they need to use it (as in, they will sometimes get long multiplication right if you ask them to show their work but if you don’t ask them to they will almost certainly get it wrong).


> And importantly don’t know when they need to use it

patently false, but hey at least you’re able to see the parallel between you with a scratch pad and an LLM with a python terminal


Sure, lets test that:

https://chatgpt.com/s/t_69c420f3118081919cf525123e39598c

https://chatgpt.com/s/t_69c4215daeb481919fdaf22498fb0c4f

Do you have a different definition of false? I'm referring to their reasoning context as their scratch pad if that wasn't clear.


The context is the scratch pad. LLMs have perfect recall (ignoring "lost in the middle") across the entire context, unlike humans. LLMs "think on paper."

The conclusion that LLMs don't reason is not a consequence of them not being able to do arithmetic, so your argument isn't valid.

Also, see https://news.ycombinator.com/newsguidelines.html

"Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."

etc.


Plenty of humans can't do arithmetic. Can they also not reason.

Reasoning isn't a binary switch. It's a multidimensional continuum. AI can clearly reason to some extent even if it also clearly doesn't reason in the same way that a human would.


> Plenty of humans can't do arithmetic. Can they also not reason.

I just pointed out that this isn't valid reasoning ... it's a fallacy of denial of the antecedent. No one is arguing that because LLMs can't do arithmetic, therefore they can't reason. After all, zamalek said that he can't quickly multiply large numbers in his head, but he isn't saying that therefore he can't reason.

> Reasoning isn't a binary switch. It's a multidimensional continuum.

Indeed, and a lot of humans are very bad at it, as is clear from the comments I'm responding to.

> AI can clearly reason to some extent

The claim was about LLMs, not AI. This is like if someone said that chihuahuas are little and someone responded by saying that dogs are tall to some extent.

LLMs do not reason ... they do syntactic pattern matching. The appearance of reasoning is because of all the reasoning by humans that is implicit in the training data.

I've had this argument too many times ... it never goes anywhere. So I won't respond again ... over and out.


Indeed, and a lot of humans are very bad at it, as is clear from the comments I'm responding to.

This is your idea of "conversing curiously" and "editing out swipes," I suppose.

I've had this argument too many times ... it never goes anywhere. So I won't respond again ... over and out.

A real reasoning entity might pause for self-examination here. Maybe run its chain of thought for a few more iterations, or spend some tokens calling research tools. Just to probe the apparent mismatch between its own priors and those of "a lot of humans," most of whom are not, in fact, morons.


> Don't be snarky.

ROFL

Comments should get more thoughtful and substantive

Yes, they should, but instead we're stuck with the stochastic-parrot crowd, who log onto HN and try their best to emulate a stochastic parrot.


i assert that by your evidentiary standards humans don't reason.

presumably one of us is wrong.

therefore, humans don't reason.


LLMs don't use tools. Systems that contain LLMs are programmed to use tools under certain circumstances.

you’re just abstracting it away into this new “systems” definition

when someone says LLMs today they obviously mean software that does more than just text, if you want to be extra pedantic you can even say LLMs by themselves can’t even geenrate text since they are just model files if you don’t add them to a “system” that makes use of that model files, doh


> when someone says LLMs today they obviously mean ...

LLMs, if the someone is me or others who understand why it's important to be precise. And in this context, the distinction between LLM and AI mattered--not pedantic at all.

I won't respond further ... over and out.


bunnie your book "Hacking the XBox" taught me how to get started on reversing electronics, took the fear out of the process, and replaced it with fun. Thanks for the multi-decades long effort you've made to make these tools available and accessible and approachable, your contributions to the hacker community are immeasurable and I cannot say thank you enough.

Thanks man!


Thank you for sharing! Comments like this make all the effort worthwhile. <3


If automated AI rewrites are generally feasible, then the marginal price of nearly all software trends to zero.


If code becomes essentially free (ignoring for a moment the environmental cost or the long term cost of allowing code generation to be tollboothed by AI megacorps) the value of code must lie in its track record.

The 5-day-old code in chardet has little to no value. The battle-tested years-old code that was casually flushed away to make room for it had value.


What you describe is essentially what happened, the AI result working from specs and tests was more performant than the original. The real AI you describe just rewrote chardet without looking at the source, only better.


How do you know it didn’t look at the source?


It was instructed to look at the source...


It was instructed NOT to look at the source, with the one exception that it was told to look at this single file full of charset definitions: https://github.com/chardet/chardet/blob/f0676c0d6a4263827924...


Is there any visibility or accountability to record exactly what it did and not look at? I doubt it. So we're left with a kind of Rorschach test: some people think LLMs follow rules like law-abiding citizens, and some people distrust commercial LLMs because they understand that commercial LLMs were never designed for visibility and accountability.


There should exist a .jsonl file somewhere with exactly that information in it - might be worth Dan preserving that, it should be in a ~/.claude/projects folder.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: