It is text prediction. But to predict text, other things follow that need to be calculated. If you can step back just a minute, i can provide a very simple but adjacent idea that might help to intuit the complexity of “ text prediction “ .
I have a list of numbers, 0 to9, and the + , = operators. I will train my model on this dataset, except the model won’t get the list, they will get a bunch of addition problems. A lot. But every addition problem possible inside that space will not be represented, not by a long shot, and neither will every number. but still, the model will be able to solve any math problem you can form with those symbols.
It’s just predicting symbols, but to do so it had to internalize the concepts.
This gives the impression that it is doing something more than pattern matching. I think this kind of communication where some human attribute is used to name some concept in the LLM domain is causing a lot of damage, and ends up inadvertently blowing up the hype for the AI marketing...
I think what's causing a lot of damage is not attributing more of human attributes (though carefully). It's not the LLM marketing you have to worry about - that's just noise. All marketing is malicious lies and abusive bullshit, AI marketing is no different.
Care about engineering - designing and securing systems. There, the refusal to anthropomorphise LLMs is doing a lot of damage and wasted efforts, with good chunk of the industry believing in "lethal trifecta" as if it were the holy Trinity, and convinced it's something that can be solved without losing all that makes LLMs useful in the first place. A little bit of anthropomorphising LLMs, squinting your eyes and seeing them as little people on a chip, will immediately tell you these "bugs" and "vulnerabilities" are just inseparable facets of the features we care about, fundamental to general-purpose tools, and they can be mitigated and worked around (at a cost), but not solved, not any more you can solve "social engineering" or better code your employees so they're impervious to coercion or bribery, or being prompt-injected by a phone call from their loved one.
Except I actually mean to infer the concept of adding things from examples. LLMs are amply capable of applying concepts to data that matches patterns not ever expressed in the training data. It’s called inference for a reason.
Anthropomorphic descriptions are the most expressive because of the fact that LLMs based on human cultural output mimic human behaviours, intrinsically. Other terminology is not nearly as expressive when describing LLM output.
Pattern matching is the same as saying text prediction. While being technically truthy, it fails to convey the external effect. Anthropomorphic terms, while being less truthy overall, do manage to effectively convey the external effect. It does unfortunately imply an internal cause that does not follow, but the externalities are what matter in most non-philosophical contexts.
>do manage to effectively convey the external effect
But the problem is that this does not inform about the failure mode. So if I am understanding correctly, you are saying that the behavior of LLM, when it works, is like it has internalized the concepts.
But then it does not inform that it can also say stuff that completely contradicts what it said before, there by also contradicting the notion of having "internalized" the concept.
There was a paper recently that demonstrated that you can input different human languages and the middle layers of the model end up operating on the same probabilistic vectors. It's just the encoding/decoding layers that appear to do the language management.
So the conclusion was that these middle layers have their own language and it's converting the text into this language and this decoding it. It explains why sometime the models switch to chinese when they have a lot of chinese language inputs, etc.
Oh, Jesus Christ. I learned to write at a college with a strict style guide that taught us how to use different types of punctuation to juxtapose two ideas in one sentence. In fact, they did/do a bunch of LLM work so if anyone ever used student data to train models, I’m probably part of the reason they do that.
You sound like you’re trying to sound impressive. Like I said, I’ll read the paper.
Pretty obvious when you think that neural networks operate with numbers and very complex formulas (by combining several simple formulas with various weights). You can map a lot of things to number (words, colors, music notes,…) but that does not means the NN is going to provide useful results.
Everything is obvious if you ignore enough of the details/problem space. I’ll read the paper rather than rely on my own thought experiments and assumptions.
after you go from from millions of params to billions+ models start to get weird (depending on training) just look at any number of interpretability research papers. Anthropic has some good ones.
I heard that they might have fixed the problem, but I initially dropped it when they stopped respecting quotes, even in verbatim mode. Like, if I’m looking for an obscure product number, I don’t want a bunch of shit with a few digits off if there are no actual hits. I want no hits if the settings and query demand it.
In my experience, most garden variety security problems stem from a) the developer not understanding the implications of something (maybe because they’re new, or operating outside of their usual domain,) or b) the developer not paying close enough attention to realize they did something they know is stupid. We’re only human.
Vibe coding obviously doesn’t make something insecure, per se, but saying it doesn’t reduce the attention paid to any given line of code, or encourage less knowledgeable people to write code, seems pretty dubious to me.
The Claude Code team is clearly competent and professional, yet they accidentally published the proprietary source code for one of the world’s hottest products. That’s like a Bank manager walking away with the keys in the door and alarm disarmed. When’s the last time you heard of a human team of developers doing that?
Again, I’m not saying that vibe coding necessarily creates unsafe code, but I don’t see how anyone could say vibe coding was devoid of security implications. I think this is an organizational/logistical problem that we’ll figure out at some point, but in think it’s going to be more of a C buffer overflow ‘figured out’ that never really goes away.
Very reasonable take, I agree 100%. But I don't you're putting any responsibility with users of the such very vibe coded apps. OpenClaw was primarily marketed towards devs and people in touch with IT. They should know better.
Sure. I reckon blaming the system for the intentional actions of a few is a great way to avoid individual accountability. Conversely, blaming many individuals for fundamental systemic or leadership problems is a great way to avoid accountability for leaders and systemic beneficiaries. It’s not rational to exclude either.
I’m also not sure that the distinction of dev makes much of a difference in this space because chatbot marketing works pretty damn hard to imply everybody is a prompt away from being a developer. How are those people going to know that they aren’t even qualified to make any given technical decision, let alone evaluate the output of a confident chatbot that’s magically writing programs for them?
You know you’re getting into zealot territory when people are arguing semantics over the headline pointing to a zero authentication admin access vulnerability CVE that affects a double-digit percentage of users.
Thank you for the reality check. I like to assume people are coming from a certain baseline on HN, but I sometimes forget that certain topics have a passionate user base represented.
Nooope. Reread the thread from my comment up: they were arguing about whether that percentage of users warranted saying ‘probably’ in the headline. Nobody was even questioning the numbers at that point. Just people taking it at face value, getting defensive, and trying to minimize what it said.
Does it really? Digging up the data from example the 135k instances in the open reeks like bullshit, I would suspect several other claims are exaggerated as well.
> Digging up the data from example the 135k instances in the open reeks like bullshit, I would suspect several other claims are exaggerated as well.
Do you so stringently examine most CVEs? I’ll bet you don’t. Are you a big fan of this project? I’ll bet you are. Do you have any actual data to counter what they said or do you just sort of generally not vibe with it? If so, now would be a great time to break it out while this is still fresh. If not…
They are pointing out the data provided does not appear to be real. There is no credible link to this 135k number. They do not need to provide a number, as one does not appear to exist.
It’s also only 65% of those that have zero authentication configured, according to that post (which I have done nothing to confirm or challenge at all… Frankly I wouldn’t touch OpenClaw with a ten foot… cable?) That said, I think it’s far more important to get people’s attention who might otherwise not realize how closely they need to pay attention to CVEs than it is to avoid hyperbole in headlines.
Because 20% is not “probably got hacked” and overstates the problem for most users.
That doesn’t mean this isn’t a critical vulnerability, and I think it’s insane to run OpenClaw in its current state. But the current headline will burn your credibility, because 80% of users will be fine with no action, and they’ll take future security issues less seriously as a result.
All the numbers you are using appear to be made up by the reddit poster. I say that as they provided no citation to them (for all I know they got them from an AI). I attempted to verify any of the numbers he used and could not. By exaggerating the numbers he is crying wolf.
a) most people achieve social capital through relationships. Rich people gain it by distinguishing themselves among their already distinguished peers. Even if being obnoxious is what’s making you famous, you’re still more famous than anyone you know.
b) The cadre of rich people you’ve actually heard of self-select for craving attention and validation. Like most people, they aren’t good enough at anything to be famous organically, and like many of those people, are also insecure about their profound lack of specialness. But, few people have the money to buy the attention they crave.
Many developers overestimate their agency without extremely high labor demand. We got a say because replacing us was painful, not because of our ethics and wisdom. Without that leverage, developers are cogs just like every other part of the machine.
Python and C++ have been used for countless large projects— each one for many more than typescript. It’s all about trade-offs that take into account your tasks, available coders at the project’s commencement, environment, etc.
People like to put companies that are household names on pedestals, but the choices they make are mostly guided by what their people can do and which choices give them the most value for free. They mostly operate how smaller companies do but they have a bigger R&D budget to address issues like scale that the larger market has little incentive to solve.
Also, this product is like a year old… it has barely hit its teething phase. I wouldn’t be surprised if the core is still the prototype someone whipped up as a proof of concept.
I reckon some believe these companies are basically magical, and are utterly astonished when they’re shown to be imperfect in relatively uninteresting ways. I’m a lot more concerned about the sanity of the AI ecosystem they operate in than the stability of some front-end Anthropic made.
Or all of the people that they didn’t ask, let alone compensate, that made all of the stuff they munged up for training data, so they could sell cheap knockoffs in the same markets.
Tech industry folks have been so coddled for decades that many think their astonishing intellect has earned them a cushy life rather than being in a field with high labor demand. It’s one reason tech workers are often considered arrogant and out of touch… and that’s why people think they can get paid to lightly orchestrate agents to do their jobs. Oof.
If efficiency gains create an oversupply of tech labor, even the bestie BFF bosses will notice the hoards of more qualified people who will kill for any job that pays more than CVS or Uber— so a lot less than most developers make now. The tech world regularly, shamelessly cuts higher-earning higher-skill workers for cheaper “good enough” replacements. Best of luck.
Even many of the folks that see the writing on the wall have fanciful visions of using their astonishingly capable genius developer brain to maintain or quickly re-achieve some of their high status in the trades. As a union tradesman, I’d find that misconception hilarious if I didn’t feel so bad for them. A lot of folks are going to have a lot of bitter medicine to swallow.
What do you mean by that? It’s literally text prediction, isn’t it?
reply