Hacker Newsnew | past | comments | ask | show | jobs | submit | epicprogrammer's commentslogin

I've been noticing this creeping into my own AI coding suggestions lately. An LLM doesn't inherently understand "abandonware" or community health; it just sees that a package technically solves the logic puzzle in its context window. We've spent the last decade building CI/CD tooling to catch known CVEs, but we don't have great guardrails for an AI confidently importing an 8-year-old unmaintained library that happens to have zero reported vulnerabilities simply because nobody has looked at it in a decade.


It’s an interesting throwback to SEDA, but physically passing file descriptors between different cores as a connection changes state is usually a performance killer on modern hardware. While it sounds elegant on a whiteboard to have a dedicated 'accept' core and a 'read' core, you end up trading a slightly simpler state machine for massive L1/L2 cache thrashing. Every time you hand off that connection, you immediately invalidate the buffers and TCP state you just built up. There’s a reason the industry largely settled on shared-nothing architectures like NGINX having a single pinned thread handle the entire lifecycle of a request keeps all that data strictly local to the CPU cache. When you're trying to scale, respecting data locality almost always beats pipeline cleanliness.


You could presumably have an acceptor thread per core, which passes the fds to core alligned next thread, etc.

That would get you the code simplicity benefits the article suggests, while keeping the socket bound to a single core, which is definitely needed.

Depending on if you actually need to share anything, you could do process per core, thread per loop, and you have no core to core communication from the usual workings of the process (i/o may cross though)


I don't think the author intended "code simplicity" as an end unto itself but a way to reduce cache pressure. He popped into the 2016 discussion [1] to say:

> Another benefit of this design overlooked is that individual cores may not ever need to read memory -- the entire task can run in L1 or L2. If a single worker becomes too complicated this benefit is lost, and memory is much much slower than cache.

I think this is wrong or at least overstated: if you're passing off fds and their associated (kernel- and/or user-side) buffers between cores, you can't run entirely in L1 or L2. And in general, I'd expect data to be responsible for much more cache pressure than code, so I'm skeptical of localizing the code at the expense of the data.

But anyway, if the goal is to organize which cores are doing the work, splitting a single core's work from a single thread (pinnned to it) to several threads (still pinned to it) doesn't help. It just introduces more context switching.

[1] https://news.ycombinator.com/item?id=10874616


> But anyway, if the goal is to organize which cores are doing the work, splitting a single core's work from a single thread (pinnned to it) to several threads (still pinned to it) doesn't help. It just introduces more context switching.

(Mostly agreeing with you, I think). I think looking at the overall system and saying (handwave numbers) 25% of system time is spent on accept, and 75% on request handling, so let's set 25% of cores to accept and 75% to handle requests is unfortunately the wrong way to split the work too. Each core would have a small userland loop, but, communication between processes is expensive. And you have (more than necessary) kernel side communication between processors too because the TCP state will be touched by the processor handling the NIC queue it arrives on, as well as the processor handling the listen queue in userland and then the processor handling the request in userland. Setting up your system to have high interprocessor communication limits the number of cores you can effectively use.


Agreed. The best is going to be to use steering [1] and one pinned thread per core to keep each connection handled on one core as completely as possible.

...with the caveat that it makes the load-balancing much harder when each core is essentially an independent server. If you overload some cores, even briefly, your tail latency will really suffer. And if you decrease utilization to compensate for it, you've lost the efficiency advantage you were going for too. Such that the more conventional approach of a single multi-core reactor can be much better if you don't have a very good load-balancing story.

...another caveat: if you have some massive shared dataset (think search), the cache-efficient approach goes the total other way: each core should own some shard, and a single request should be fanned out across all of them.

...so the best model may vary, but it's not the one in this article.

[1] https://www.kernel.org/doc/html/v5.1/networking/scaling.html


Well, kernels grown some support for steering accept() to worker thread directly. For instance SO_REUSE_PORT (Linux)/SO_REUSE_PORT_LB (FreeBSD).


While I agree that shared nothing wipes the pants performance-wise of shared state, surely the penalty you've outlined is only for super short lived connections?

For longer lived connections the cache is going to thrash on an inevitable context switch anyway (either do to needing to wait for more I/O or normal preemption). As long as processing of I/O is handled on a given core, I don't know if there is actually such a huge benefit. A single pinned thread for the entire lifecycle has the problem that you get latency bottlenecks under load where two CPU-heavy requests end up contending for the same core vs work stealing making use of available compute.

The ultimate benefit would be if you could arrange each core to be given a dedicated NIC. Then the interrupts for the NIC are arriving on the core that's processing each packet. But otherwise you're already going to have to wake up the NIC on a random core to do a cross-core delivery of the I/O data.

TLDR: It's super complex to get a truly shared nothing approach unless you have a single application and you correctly allocate the work. It's really hard to solve generically optimally for all possible combinations of request and processing patterns.


Having spent some time in the anti-abuse and Trust & Safety space, I always take these vendor reports with a massive grain of salt. It’s a classic case of comparing apples to vendor-marketing oranges. A headline screaming about an 84% miss rate sounds like a systemic collapse until you look at the radically different constraint envelopes a global default like GSB and a specialized enterprise vendor operate under.

The biggest factor here is the false-positive cliff. Google Safe Browsing is the default safety net for billions of clients across Chrome, Safari, and Firefox. If GSB’s false-positive rate ticks up by even a fraction of a percent, they end up accidentally nuking legitimate small businesses, SaaS platforms, or municipal portals off the internet. Because of that massive blast radius, GSB fundamentally has to be deeply conservative. A boutique security vendor, on the other hand, can afford to be highly aggressive because an over-block in a corporate environment just results in a routine IT support ticket.

You also have to factor in the ephemeral nature of modern phishing infrastructure and basic selection bias. Threat actors heavily rely on automated DGAs and compromised hosts where the time-to-live for a payload is measured in hours, if not minutes. If a specialized vendor detects a zero-day phishing link at 10:00 AM, and GSB hasn't confidently propagated a global block to billions of edge clients by 10:15 AM, the vendor scores it as a "miss." Add in the fact that vendors naturally test against the specific subset of threats their proprietary engines are tuned to find, and that 84% number starts to make a lot more sense as a top-of-funnel marketing metric rather than a scientific baseline.

None of this is to say GSB is perfect right now. It has absolutely struggled to keep up with the recent explosion of automated, highly targeted spear-phishing and MFA-bypass proxy kits. But we should read this report for what it really is: a smart marketing push by a security vendor trying to sell a product, not a sign that the internet's baseline immune system is totally broken.


> We also ran the full dataset of 263 URLs (254 phishing, 9 confirmed legitimate) through Muninn's automatic scan. This is the scan that runs on every page you visit without any action on your part. On its own, the automatic scan correctly identified 238 of the 254 phishing sites and only incorrectly flagged 6 legitimate pages. [...] The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, ...

Am I missing something or is that a 66%/100% False Positive Rate on legitimate Sites?

If GSB would have that ratio, it would be absolute unusable.. So comparing these two is absolutely wrong...


The 9/9 is actually crazy, and then they posted about it as if they found something? What they did was find a major issue in their own process and then told the world about it, that just doesn't seem right.


Crazy, and also like, 9? The sample size in that part of your test suite is 9?


It would seem their service identifies only phishing sites as legitimate ones. It would seem 100% of sites they deem legitimate are phishing sites. Incredible.


The deep scan detected all phishing sites correctly with the unfortunate tagging of legit sites as phishing too. I imagine their code looks something like isPhishing = true.


lol


> I always take these vendor reports with a massive grain of salt.

Yeah. "Here's a blog post with some casually collected numbers about our product [...] It turns out that it's great!" is sorta boring.

But couple that with a headline framed as "Google [...] Bad" and straight to the top of the HN front page it goes!


> I always take these vendor reports with a massive grain of salt. It’s a classic case of comparing apples to vendor-marketing oranges. A headline screaming about an 84% miss rate sounds like a systemic collapse until...

I've seen this before in the ip blocklist space... if you're layering up firewall rules, you're bound to see the higher priority layers more often.

That doesn't mean the other layers suck, security isn't always an A or B situation...

On the other hand, I don't know how I feel about how GSB is implemented... you're telling google every website you go to, but chances are the site already has google analytics or SSO...


i thought it was checks against a local list of hashes? with frequent updates


this is how Firefox does it. can't speak for the rest.


These are fair points and I agree with a lot of them. GSB operates at a scale we don't, and the conservatism that comes with being the default for billions of users is a real constraint. The post tries to acknowledge that ("the takeaway from all of this is not that Google Safe Browsing is bad") and we're upfront about the timing caveat since these were checked at time of scan.

Where I'd push back is on what this means for the average person. Most people have no protection against phishing beyond what their email provider and browser give them. If that protection is fundamentally reactive, catching threats hours or days after they go live, that's a real limitation worth talking about honestly. The 84% number isn't meant to say GSB is broken. It's meant to say there's a gap, and that gap has consequences for real users regardless of the engineering reasons behind it.

On the marketing angle, we aren't currently selling anything. The extension is free and so is submitting URLs for verification. We recognize it would be disingenuous to say we never will, but at the very least the data and the ability to check URLs (similar to PhishTank before they closed registration) will always be free. The dataset is also sourced from public threat intelligence feeds, not a curated set designed to make our tool look good. We think publishing findings like this is valuable even if you set aside everything about our tools.


> We think publishing findings like this is valuable even if you set aside everything about our tools.

In what way is it valuable?


It really is the classic "you either die a safety-first AI lab, or you live long enough to see yourself back at the Pentagon negotiating table" arc.


It's easy to frame this purely as an ethical battle, but there's a massive financial reality here. Training frontier models requires astronomical amounts of capital, and the DOD is one of the few entities with deep enough pockets to fund the next generation of compute. Anthropic turning down this Pentagon contract over safety disagreements is a huge gamble. They are essentially betting that the enterprise market will reward their 'Constitutional AI' approach enough to offset the billions OpenAI will now make from government defense contracts. OpenAI wants the DOD money while maintaining a consumer-friendly PR sheen; Amodei is just pointing out that they can't have it both ways.


It’s a $200M contract. That’s not nothing but it’s definitely not such a huge sum for these companies at their scale when they’re spending billions on infrastructure.

I’m sure anthropic has signed up more revenue this week in response to this debacle to cover it. Where they’re actually screwed is if the gov follows through and declare anthropic a supply chain risk.


It's not "just" a $200m contract, it's the start of a lucrative relationship

1. Stargate seemed to require a dedicated press conference by the President to achieve funding targets. Why risk that level of politicization if it didn't?

2. Greg Brockman donated $25mil to Trump MAGA Super PAC last year. Why risk so much political backlash for a low leverage return of $200m on $25m spent?

3. During WW2, military spend shot from 2% to 40% of GDP. The administration is requesting $1.5T military budget for FY2027, up from $0.8T for FY2025. They have made clear in the past 2 months that they plan to use it and are not stopping anytime soon

If you believe "software eats the world" it is reasonable to expect the share of total military spend to be captured by software companies to increase dramatically over the next decade. $100B (10% of capture) is a reasonable possibility for domestic military AI TAM in FY2027 if the spending increase is approved (so far, Republicans have not broken rank with the administration on any meaningful policy)

If US military actions continue to accelerate, other countries will also ratchet up military spend - largely on nuclear arsenals and AI drones (France already announced increase of their arsenal). This further increases the addressable TAM

Given the competition and lack of moat in the consumer/enterprise markets, I am not sure that there is a viable path for OpenAI to cover it's losses and fund it's infrastructure ambitions without becoming the preferred AI vendor for a rapidly increasing military budget. The devices bet seems to be the most practical alternative, but there is far more competition both domestically (Apple, Google, Motorola) and globally (Xiaomi, Samsung, Huawei) than there is for military AI

Having run an unprofitable P&L for a decade, I can confidently state that a healthy balance sheet is the only way to maintain and defend one's core values and principles. As the "alignment" folks on the AI industry are likely to learn - the road to hell (aka a heavily militarized world) is oft paved with the best intentions


First, I have to say I loved your thoughtful & detailed comment. You have clearly considered this from the financial side; let me add some color from the perspective of someone working with frontier researchers.

> As the "alignment" folks on the AI industry are likely to learn

I will push back here. Dario & co are not starry-eyed naive idealists as implied. This is a calculated decision to maximize their goal (safe AGI/ASI.)

You have the right philosophy on the balance sheet side of things, but what you're missing is that researchers are more valuable than any military spend or any datacenter.

It does not matter how many hundreds of billions you have - if the 500-1000 top researchers don't want to work for you, you're fucked; and if they do, you will win because these are the people that come up with the step-change improvements in capability.

There is no substitute for sheer IQ:

- You can't buy it (god knows Zuck has tried, and failed to earn their respect).

- You can't build it (yet.)

- And collaboration amongst less intelligent people does not reliably achieve the requisite "Eureka" realizations.

Had Anthropic gone forth with the DoD contract, they would have lost this top crowd, crippling the firm. On the other hand, by rejecting the contract, Anthropic's recruiting just got much easier (and OAI's much harder).

Generally, the defense crowd have a somewhat inflated sense of self worth. Yes, there's a lot of money, but very few highly intelligent people want to work for them. (Almost no top talent wants to work for Palantir, despite the pay.) So, naturally:

- If OpenAI becomes a glorified military contractor, they will bleed talent.

- Top talent's low trust in the government means Manhattan Project-style collaborations are dead in the water.

As such, AGI will likely emerge from a private enterprise effort that is not heavily militarized.

Finally, the Anthropic restrictions will last, what, 2.5 more years? They are being locked out of a narrow subset of usecases (DoD contract work only - vendors can still use it for all other work - Hegseth's reading of SCR is incorrect) and have farmed massive reputation gains for both top talent and the next administration.


This is an interesting perspective. What happens if there is a large global war? Do researchers who were previously against working with the DoD end up flipping out of duty? Does the war budget go up? Does the DoD decide to lift any ban on Anthropic for the sake of getting the best model and does Anthropic warm its stance on not working with autonomous weapons systems?

I don’t know the answers to these questions, but if the answer is “yes” to at least 1 or 2, then I think the equation flips quite a bit. This is what I’m seeing in the world right now, and it’s disconcerting:

1. Ukraine and Russia have been in a skirmish that has been drawn out much longer than I would guess most people would have guessed. This has created a divide in political allegiance within the United States and Europe.

2. We captured the leader of Venezuela. Cuba is now scared they are next.

3. We just bombed Iran and killed their supreme leader.

4. China and the US are, of course, in a massive economic race for world power supremacy. The tensions have been steadily rising, and they are now feeling the pressure of oil exports from Iran grinding to a halt.

5. The past couple days Macron has been trying to quell tension between Israel and Lebanon.

I really do not hope we are not headed into war. I hope the fact that we all have nukes and rely on each others’ supply chains deters one. But man does it feel like the odds are increasing in favor of one, and man does that seem to throw a wrench in this whole thing with Anthropic vs. OpenAI.


> 3. We just bombed Iran and killed their supreme leader.

Being accurate, by all reporting Israel killed Iran's leadership.

Yes, likely enabled by US intelligence, but the one who pulls the trigger does matter.


"We" here clearly means USA+israel. There isn't a distinction between the two when they're working towards the same goals, bombing everything in sight, together.

The one who pulled the trigger is irrelevant here, because both have pulled the trigger hundreds or thousands of times in the past few days, dividing up targets between them for the joint operation.


Given that direct assassination is still prohibited by EO 11905 / 12036 / 12333, it's a major issue if the US president ordered the strike or not.

I'm aware that internet forums like to play fast and loose with insinuations, but facts are facts.


> Given that direct assassination is still prohibited by EO 11905 / 12036 / 12333

It sounds like you think this means something?

Obviously it doesn't when we're talking about an administration that openly breaks laws, much less EOs, and issues whatever EOs they want saying whatever they want, even in violation of previous EOs. There aren't even any repercussions to the president "violating an EO".

So, the pedantry here is irrelevant. The two parties are on the same team, working towards the same goal, doing the same things, divvying up the list of targets to strike.


> It sounds like you think this means something?

If you'd rather talk with yourself, I'll see myself out of this convo. No time for folks who would rather indulge in hyperbole than messy reality.


Given that you totally ignored the substance of my post, and instead focused on attacking me personally, it does seem like you're not interested in a discussion, and not a good fit for the HN culture and guidelines. So yeah, maybe you are right and it would be better if you left.

But! That's not who you always have to be! I'm confident you can coherently articulate your point without resorting to that. Feel free to come back if you're willing to share why you feel the president not complying with a presidential executive order is significant here, rather than insignificant.

Anyways, happy friday!


that is considering if there will be elections, which many people don't believe it's the case.

reminder that trump has been flirting with just continuing in power (2028 hats and talks about a third term) and is responsible for trying a coup last time he lost.

personally I think there's a possibility where he'll just declare martial law and stay in power at the end of his term.


> researchers are more valuable than any military spend or any datacenter. It does not matter how many hundreds of billions you have - if the 500-1000 top researchers don't want to work for you, you're fucked; and if they do, you will win because these are the people that come up with the step-change improvements in capability.

This is a massive cope imo. The reason that the AI industry is so incestuous is just because there are only a handful of frontier labs with the compute/capital to run large training clusters.

Most of the improvements that we’ve seen in the past 3 years are due to significantly better hardware and software, just boring and straightforward engineering work, not brilliant model architecture improvements. We are running transformers from 2017. The brilliant researchers at the frontier labs have not produced a successor architecture in nearly a decade of trying. That’s not what winning on research looks like.

Have there been some step-change improvements? Sure. But by far the biggest improvement can be attributed to training bigger models on more badass hardware, and hardware availability to serve it cheaply. To act like the DoD isn’t going to be able to stand up pytorch or vllm and get a decent result is hilarious: the reason you use slurm and MPI and openshmem is because national labs and DoD were using it first. NCCL is just gpu accelerated scope-reduced MPI. nvshmem is just gpu accelerated scope-reduced openshmem.

If anything, DoD doesn’t have the inference throughput requirements that the unicorns have and might just be able to immediately outperform them by training a massive dense model without optimizing for time to first token or throughput. They don’t have to worry about if the $/1M tokens makes it economically feasible to serve, which is a primary consideration of the unicorns today when they’re choosing their parameter counts. They can just rate limit the endpoint and share it, with a 2 hour queue time.

The government invented HPC, it’s their world and you’re just playing in it.

> Generally, the defense crowd have a somewhat inflated sense of self worth.

/eyeroll but nobody can do what you do!


Sure the architecture is from 2017. But the gap between GPT-1 and frontier models today is not simply "more FLOPs" and as simple as "standing up PyTorch and vllm" - theres thousands of undocumented decisions about data, alignment, reward modeling, training stability, and inference-time strategies, and lots of tribal knowledge held by a small group of people who overwhelmingly do not want to work on weapons systems.

The dense model argument is self-defeating long term. Sparsity (MoE etc.) lets you build a smarter model at the same compute budget, so going dense because you can afford to waste FLOPs is how you fall behind b/c you never came up with the step function improvements needed.

Sure, the DoD invented HPC, but it also invented the internet, and then the private sector made it actually useful.


That is with the Pentagon directly only. Now they will lose much more because no defense contractor, subcontractor and so on can use them for anything defense related (even if they use the model to invent a new type of screw, if that screw is going to be used in anything military).

So yeah, they bet a whole lot on “look at us, we have morals”.


their revenue went up 4 billion in the week since this story started.


There's no legal basis for blocking defense contractors from using them. Trump's claiming he can do so, but the law doesn't back him up. He'll lose in any fair court, or any corrupt court that values billionaire interests over virtue signaling to the orange one (like the Supreme Court).

Also, they got a huge PR win, and jumped to #1 on the Apple App Store. Consumer market share is going to decide which of the AI companies is the market leader, not fickle government contracts.


Consumer market share? Absolutely not.

If you look at what generates cash, it's corp to corp. That's across most industries. While there are markets that are consumer mostly, LLMs have immense and enormous business facing revenue potential. The consumer market is a gnat in comparison.


There are always Executive Orders that can enforce that. It is not like in the movies where they will sort stuff out in 2 weeks in a single trial. It is going to take years, and we'll see if Anthropic survives that.


I'm guessing they believe they will be around longer than this administration.


I think the point is that there's potentially a lot more than $200m in defense dollars at stake here, in the future.


There are certainly black budget dollars at stake as well, which are much more lucrative.


> It's easy to frame this purely as an ethical battle, but there's a massive financial reality here.

As opposed to all those famous ethical battles where there's nothing in it for you to do the wrong thing?


Based on OP's comment history, 50/50 chance AI wrote that...


> but there's a massive financial reality here.

Not a chance. The DoD has massive pockers which and INCREDIBLY SPREAD OUT. You can't underestimate how spread this money is. The DoD has maybe a 64 GPU cluster and ALMOST NO ONE USES IT FOR DEEP MODEL TRAINING. Even contractors end up working with DGX boxes to do all their training.

As of 2023, I was doing the largest Deep learning training runs out of anyone I have known in the industry and I've been in the industry for 20 yeras. The second best groups behind mine were using 4 GPU locally machines that they had to purchase on contract.

There's no way the DoD can train these models themselves, not even close. They are COMPLETELY DEPENDENT ON INDUSTRY. I was the PM for a DARPA program in 2023 and SAME PROBLEM. They had no compute or would rely on university compute if a program had a university partner. YOU HAVE NO IDEA HOW FAR BEHIND THE DOD IS IN THIS SPACE.


Are you arguing against free market capitalism in favor of fascism? If OpenAI needs billions of taxpayers money to survive then should that project exist? Why?


The most fascinating detail here isn't just the images, but the temperature data. JWST actually found a localized 'cold spot' (538K) right in the core of Io's magnetic footprint, surrounded by a much hotter main aurora. The fact that the ion density shifts so drastically in a matter of minutes is wild.


I've built a few internal tools using the Workspace APIs, and while they are powerful, the rate limits on the Drive API can be brutal if you are doing bulk operations. Does this repository handle automatic backoff and retries, or do we need to wrap it ourselves?


I am one of the builders behind Dropstone.

Yesterday, I posted our whitepaper here as a 'discovery' because I was nervous about a cold launch. That was a mistake, and the community rightfully flagged it for lacking transparency. I apologize for the cloak and dagger.

We are reposting this today as an official Show HN to stand behind the tech properly.

The Problem: We built this because we hit the 'Linearity Barrier' with standard agents—after 50+ coding steps, context rot sets in and the agent starts hallucinating.

The Solution: Dropstone uses a Recursive Swarm Topology. Instead of linear prediction, it spawns parallel 'Scout' agents to explore solution paths and uses Entropy Pruning to kill branches that hallucinate.

I'm here to answer any technical questions about our D3 Engine, the latency trade-offs of swarm architecture, or the 'Trajectory Vectors' we use for context management.


You are absolutely right. I addressed this in the thread above, but to be clear: Yes, I am part of the team. I shouldn't have tried to frame it as a 'discovery'. Apologies.


I am incredibly sorry about that. It sounds like the agent hit an infinite reasoning loop and burned your credits—that is a critical failure on our end. I want to fix this personally. Please email me at tom@blankline.org (or just reply here if you prefer). I will refund your $15 immediately.

I'm adding free credits to your account so you can test the D3 update when it drops tomorrow (which patches this loop).

We clearly have work to do on the beta fail-safes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: