More

skiing_crawling · 2026-06-22T23:26:08 1782170768

"it can fit" on 256GB of RAM, but it will be heavily quantized and still run very slowly. The headline number is not token generation, its prompt processing. So if you get 10 tok/s and an API gives you 20-30 tok/s, it doesn't seem that bad on its face, but a mac studio or any other machine that's not loading all of it into GPU will do PP 20-50X slower than a purely GPU based setup, which is what actually makes this unusable without $50k in GPUs.

On top of that, you will still be heavily quantized.

gerdesj · 2026-06-22T23:49:40 1782172180

A nvidia spark thingie has 128GB unified RAM. They also have a dual port version of one of these things: https://www.nvidia.com/content/dam/en-zz/Solutions/networkin.... ie 2 x 100GB/s ports, they may even be 2 x 200GB/s. Once I've got my paws on one, I'll know more.

You can cluster these beasts too. Two and three (with two IP subnets) is fairly obvious. Four or more might need a switch depending on how much network latency affects things.

Apple seem to have forgotten about M series with gobs of RAM. I can't get the Apple shop to show more than 96GB of unified RAM and that costs a kidney.

mapontosevenths · 2026-06-23T00:25:24 1782174324

I have one, and I love it. That said my buddies Mac smokes it for inference workloads in terms of tokens per second AND its more usable for other things.

If you are training and doing research it's great, if you want to cluster them it cant be beat, but if you just want local inference on a single box buy a mac or even a strix halo device.

gerdesj · 2026-06-23T22:53:05 1782255185

Get your buddy with his smoking Mac to allow multiple concurrent connections and see how it gets on compared to your Spark. Don't ever use a single "chat" test to derive performance - try running say 10 or more.

You might also notice that your Spark has a pair of QSFP28 or DD (not sure yet) type interfaces as well as the 10Gb/s ethernet - that network card is a right old beast and adds quite a lot to the cost. It is capable of either 200 or 400Mb/s and can be split into two lots of four. Your mate's Mac probably has a wifi connection and is too cool for ethernet 8)

That NIC is there for a good reason - the Spark wants some friends to cluster with and you will absolutely spank any Mac when you spaff Mac style money on say three of these beasts and some cables and cluster them up. If you want four or more, you will need a switch and Mikrotik and others have them.

Casual "tokens per second" in AI is a bit like gamers whittering on about "ping" when they are using TCP and UDP for their games. ICMP request/response is a handy way of testing network paths and can give some indications towards potential performance limitations.

colinsane · 2026-06-23T01:46:07 1782179167

can those macs boot linux? i've heard about Asahi but have no idea how far along they are. i've got my fleet configured with nix and sure, nix can target darwin, but there's a _lot_ of sharp edges there: i don't really want to pull that thread unless i have to...

mapontosevenths · 2026-06-23T02:17:49 1782181069

I don't know. I think he just uses LMStudio most of the time on his, but that's one place I can say the spark really shines for me.

I'm a Linux guy, but also don't always have alot of time. The Spark comes out of the box with a nice Linux distro that's pre-configured to be easy to setup and the guides and online resources make getting up and running trivial, for even some complex tasks. You would have to do a LOT of tinkering just to figure out some of the things the nvidia resources walk you through natively. They have guides for a ton of stuff that include the optimal settings so you don't have to figure it all out through trial and error.

Check out these "playbooks" for some examples. [0] There's a lot to be said for not having to piece all that together yourself.

https://build.nvidia.com/spark

I think between unboxing mine setting it up to run headless, and generating tokens was like 20 minutes total for me.

theYipster · 2026-06-23T17:34:56 1782236096

Not the new ones. Only the M1 and M2 have good support for Asahi. But you really don't need it. If you need Linux, use a VM (UTM is free and is equivalent to KVM/QEMU in speed, despite being a Type-2 Hypervisor.)

Fizz43 · 2026-06-23T00:42:26 1782175346

which mac is smoking the spark?

theYipster · 2026-06-23T17:39:45 1782236385

Mine, for one. M5 Max MacBook Pro 128GB with a 4TB SSD. $5100 after a $1000 discount at Microcenter. Great deal if you can find it in stock.

pmarreck · 2026-06-23T01:31:50 1782178310

pretty much any of them, dude, as long as you have enough RAM, since it uses unified RAM and a powerful SoC CPU/GPU. Literally any M-class model, but the M5 is currently top tier.

dannyw · 2026-06-23T03:22:47 1782184967

The DGX Spark has basically the same memory bandwidth as a M5 Pro, and far more than a M5.

Only the M3 Ultra really beats it, and once you start scoping out the cost of a M3 Ultra with 128GB or 256GB, the DGX Spark doesn’t look bad after all.

entrope · 2026-06-23T12:43:55 1782218635

> The DGX Spark has basically the same memory bandwidth as a M5 Pro, and far more than a M5.

I see ~274 GB/sec for the DGX Spark[1], versus 307 GB/sec for M5 Pro and 460 or 614 GB/sec for M5 Max[2]. One might call 90% "basically the same", but there are nominally two tiers above "Pro".

Yes, a MacBook Pro with 128 GB and M5 Max costs $5100 (14") or $5400 (16") versus currently $4700 for the DGX Spark, but the MBP includes keyboard, mouse, battery and portability. I believe its prefill is slower and you get 2 TB vs 4 TB SSD, but overall one gives up a lot to save 10% of the cost.

[1]- https://docs.nvidia.com/dgx/dgx-spark/hardware.html [2]- https://support.apple.com/en-us/126319

pmarreck · 2026-06-23T12:51:08 1782219068

I looked, but a sibling comment just provided the links. ~274 GB/sec for the DGX Spark, vs. 307 GB/sec for M5 Pro, and max 614 GB/sec (!!!) for M5 Max? Why would you completely friggin’ lie about this, or at minimum, not double-check your facts before bullshitting? Plus, you get a full-fledged computer along with it!

Apple could actually be a good deal and you folks would still make up something to not justify it. In a way, it’s amazing what Apple has accomplished- Baseless negatively-tainted perception in certain influential tech circles.

(To be fair, they’re kind of earning it. I’m glad Tim “Sweet T” Cook is departing.)

Plus, my original comment got downvoted despite being factually-correct. Thanks, Reddit. Oh, wait…

mapontosevenths · 2026-06-23T02:08:03 1782180483

Yep. Memory bandwidth is what decides how fast LLM's generate tokens (mostly). The DGX Spark has something like 270 GB/s of memory bandwidth, and the m5 ultra is ~615 GB/s. Theoretically DOUBLE the speed. In practice he only generates like 25% more tok/s, but that's still very impressive.

The spark can fine tune models in 1/4 the time and excels at other compute tasks in ways that Mac never can. Plus the high bandwidth ConnectX-7 ports would be like $1700 to buy on a card just for the network adapters... But for generating tokens, it just plain loses.

fsuts · 2026-06-23T04:31:35 1782189095

How noisy does his fan get…

pmarreck · 2026-06-23T12:45:34 1782218734

it doesn’t get noisy at all

mapontosevenths · 2026-06-23T20:41:49 1782247309

In case anyone was wondering my spark is basically silent as well. It's great at being ignored, if that's really important to you. I've run mine completely headless since I bought it, including setup.

justincormack · 2026-06-23T10:23:46 1782210226

It is 2x200Gb/s physically but the PCIe bandwidth is basically only 200Gb/s so it may as well be one, and actually its a weird 2xPCIe4 not 1xPCIe8 so it appears in software as dual 100Gb/s. Its a bit odd.

jauntywundrkind · 2026-06-23T01:20:14 1782177614

200 Gb / s (not GB/s)!

(Still potentially very useful! But not magically ultra fast.)

Computer0 · 2026-06-23T00:12:32 1782173552

128 gb of much slower ram than Apple.

dannyw · 2026-06-23T03:23:56 1782185036

DGX Spark is ~273GB/s. That’s about M5 Pro territory, and twice as fast as the M5. You’d have to go to the M5 Max, or M3 Ultra, to get higher memory bandwidth than the Spark.

hajile · 2026-06-23T15:24:47 1782228287

If you are trying to get more than 64gb of RAM or doing tons of inferencing, you're getting a Max or Ultra anyway.

skiing_crawling · 2026-06-20T21:30:02 1781991002

unauthorized? Are they trying to find a middle ground word between "illegal" and "undocumented"?

1659447091 · 2026-06-20T22:22:32 1781994152

>> "Unauthorized Immigration. As noted earlier, we use unauthorized immigrants to refer to individuals who enter the U.S. without formal admission under immigration law. A large share of these individuals are encountered by federal authorities at ports of entry, along the border, or in the interior and are subsequently issued an NTA in immigration court, allowing them to seek asylum or otherwise challenge removal. [...]"

page 11

bluebarbet · 2026-06-20T22:20:32 1781994032

Probably and it seems a good compromise to me. Under asylum law, "illegal" is technically wrong until the final judgement is rendered. And "undocumented" is IMO an obvious manipulation of language (you would not call a doctor practicing without a medical license "undocumented"). Pending a decision on legality, "unauthorized" seems both neutral and correct.

dpe82 · 2026-06-20T21:42:09 1781991729

Seems like a fairly neutral term to me.

skiing_crawling · 2026-06-20T21:28:34 1781990914

Any generic abliterated or ubcensored open weight model (such as a qwen variant) will happily comply with requests like this.

skiing_crawling · 2026-06-08T22:47:23 1780958843

Maybe some (or many) people believe that more people will make it less "lovely". I think this is a popular stance and I think many people are more than satisfied with the current population density of their area.

skiing_crawling · 2026-05-24T17:46:02 1779644762

I’m worried about giving a foreign hosted service access to my machine for a coding agent that can run arbitrary commands and read arbitrary files. Coding agent are much more useless if you have to sit there clicking approve on everything.

nicbou · 2026-05-24T17:55:57 1779645357

To many of us, American models are also foreign-hosted, and in an increasingly hostile nation.

skiing_crawling · 2026-05-24T18:09:48 1779646188

I guess I was speaking as an American, we have good domestically hosted options so although it’s probably not ideal to send this kind of data/control anywhere at all, it’s definitely a worse option for us to send it to china vs to an American company. Every user of this service has made their machines trivially exposed to become a botnet. Im wondering why I don’t see this angle more discussed in here.

Again I’m not saying you should trust an American company necessarily more than a Chinese one, but as an American, I probably can.

Aeolos · 2026-05-24T18:26:36 1779647196

On the other hand, an American company can sell your chats to adtech/insurance/your government in ways that can harm you quite directly. Something worth considering.

coliveira · 2026-05-24T18:23:56 1779647036

As an American you should probably not trust them, because the giant American company has way more (legal or semi-legal) opportunities to manipulate your life than a Chinese one.

drstewart · 2026-05-24T22:15:41 1779660941

Good for you. So then you get it and also should distrust Chinese models for the same reason

1over137 · 2026-05-25T03:25:17 1779679517

>I’m worried about giving a foreign hosted service access to my machine...

So are the 96% of us humans that aren't USians.

skiing_crawling · 2026-05-25T04:21:36 1779682896

Does my concern somehow become less valid because I'm American? Everyone should be thinking carefully about which of their data is going where.

skiing_crawling · 2026-05-24T17:41:30 1779644490

I recently built a system at insane ddr4 prices ($2000 for 256gb). But that’s only after seeing how ddr5 prices were 3-4x that!

preisschild · 2026-05-24T17:51:45 1779645105

Yeah I upgraded all of my systems to DDR5 last year, so now I have to buy for ddr5 memory upgrades.

Joel_Mckay · 2026-05-24T17:56:57 1779645417

Had to fork over almost $1k for a 64G DDR5 kit a few weeks back. At least AMD chips large L3 cache allows folks to get away with lower grade udimms.

Also had to do an Intel build, and there was no way we were going cudimm at current prices. =3

skiing_crawling · 2026-05-22T02:08:02 1779415682

> Even if LLMs fail spectacularly

Haven't they already proven to be extremely useful? In some areas they are definitely here to stay, coding/software and search (retrieve and summarize information). There's a bunch of places where they are surely shoehorned in, overhyped, and don't belong, but there's also equally many places where they might still be transformative but aren't used yet.

But overall I think the technology is well proven.

stego-tech · 2026-05-22T02:16:48 1779416208

I always leave room open for failure, and that approach has generally served me well personally and professionally. I have never been punished for having an exit strategy.

Besides, the marketplace is still in its infancy for LLMs, with a lot of unanswered questions. A lot of those questions surround the commercial viability of frontier models on bespoke hyperscaler data centers with limited usage outside of LLMs specifically should those economics be non-viable. Since that's where the memory is being tied up into, that means it's a critical question to answer in order to determine long-term investment needs into further memory fabrication.

bigstrat2003 · 2026-05-22T09:35:50 1779442550

> Haven't they already proven to be extremely useful?

Most certainly not. The accuracy issues mean that they can't really be used effectively for coding or search, the two things you mentioned.

yxhuvud · 2026-05-22T12:05:01 1779451501

They work well functionally, but financially anything can still happen.

skiing_crawling · 2026-05-21T22:08:56 1779401336

I got an RTX 6000 pro too. I like running locally, I've learned a lot more than if I had used an API and there's less worry about overspending tokens. I accidentally spent $100 on claude api in like 2 days because I didn't know what I was doing.

The problem is that while one these gpus is a huge improvement over a laptop or a single 3090, you very quickly wish you had more. I would buy a second one, but I did the math and realized that with the current crop of models, 2 Blackwells doesn't buy me any new capability that I didn't have with one. So I would need a 3rd one. And when I buy a 3rd one I will feel like I want to running a higher quant, so then I will want a 4th.

CamperBob2 · 2026-05-21T23:20:50 1779405650

A pair of RTX6000 cards will give you a good performance boost due to tensor parallelism, though. I haven't tried the newest predictive quants but I see about 35 tps when running the 8-bit Qwen 3.6 27B model on one board and about 50 tps on two. Probably could come close to 100 tps on an optimized setup with the latest GGUFs.

Also, the 4-bit quants of MiniMax 2.7 will run at 100 tps or so with two cards, which is pretty decent. It doesn't go any faster at all with 4 GPUs from what I've seen, so if you don't actively need 384 GB of VRAM, 2x RTX6000 is a good place to be.

skiing_crawling · 2026-05-22T01:57:50 1779415070

You can get 70-80 tps on qwen3.6-27b f16 with MTP on a single card

arjie · 2026-05-21T23:33:24 1779406404

You can fit Deepseek 4 Flash on two with TP 2 and 6 different streams at 65k context. 150 tok/s

Melatonic · 2026-05-22T06:59:28 1779433168

What kind of machine did you build around it ?

skiing_crawling · 2026-05-22T22:39:10 1779489550

Using an Epyc platform to get plenty of PCIe lanes and memory channels. I have couple of extra 3090s plugged in which get some offload and help with larger models that don't fit entirely on the blackwell.

skiing_crawling · 2026-05-20T17:23:31 1779297811

At this point IPOs are mainly for unloading bags onto retail. Every institution who wanted a piece of these labs got in years ago and captured all the value.

cryo32 · 2026-05-20T17:37:59 1779298679

Wise comment. 25 years working in PE showed me that retail investors are how you pay off losses.

lofaszvanitt · 2026-05-20T18:37:28 1779302248

Yeah, and now the you shall not buy this bullshit begins. And then the price soars. :D

ericmay · 2026-05-20T17:47:26 1779299246

Well, sad to say this is simply untrue for a few reasons.

1. "Retail" does not have enough purchasing power to have all of these "bags" unloaded on to.

2. Institutions buy shares in public firms post-IPO all the time even when they're "unloading bags onto retail". Take Uber (random example) ~83% is owned by institutions.

3. General factual history of the stock market shows that you are incorrect. Successful companies that IPO and continue to do business still have quite a lot of room left to grow. What was Google's market capitalization at IPO? What is it now? Is it possible some early investors made higher multiples than the IPO -> May 20th valuation? Yea for sure. That doesn't mean that all the value was captured. It also doesn't take into account the early stage risk for investing. Is Google an "at this point IPO"? No, but the principle is the same.

It's also worth mentioning however that the number of IPOs is going down over time. You could maybe argue that the only ones that actually IPO are all the bags, but that seems like a stretch.

These cynical comments "IPOs are mainly for unloading bags on to retail" lack explanatory power and data.

CodingJeebus · 2026-05-20T18:03:08 1779300188

It's absolutely true. Just look at how private equity is now getting access to public markets and retirement accounts[0]. You think PE is letting the little guys in out of the goodness of their hearts? No, they've extracted as much as they can and the market is starting to question the absurd valuation of private assets.

A wise man once said: "if you're given an opportunity to cut an amazing deal and you can't tell who's getting screwed, then it's probably you"

0: https://pestakeholder.org/news/trump-admin-bails-out-private...

ericmay · 2026-05-20T18:40:44 1779302444

> It's absolutely true.

What is absolutely true? I'm not sure specifically what you are referring to.

> Just look at how private equity is now getting access to public markets and retirement accounts[0].

Nobody forces you to reallocate your Vanguard Total Stock Market Index Fund or wherever you have your retirement assets into a new Apollo fund.

Secondarily, we should treat people like adults and allow them to make their own investment decisions.

hypeatei · 2026-05-20T17:57:45 1779299865

So I take it you're going to buy shares of OpenAI on opening day then? ;)

Institutions merely owning a newly-IPO'd stock means nothing. They get access to shares at a reasonable price before opening while retail is buying at insane prices after open. See Figma as an example where institutional investors got it at $33/share and it ended the IPO day at $115/share with retail buying all the way up (including pops above that at like $127)

I thought it was common knowledge that IPOs are a way for insiders and early investors (not IPO flippers) to get a nice exit during the frenzy.

ericmay · 2026-05-20T18:04:15 1779300255

> So I take it you're going to buy shares of OpenAI on opening day then? ;)

Probably not. Do you understand however that your comment does not make sense in the context of my comment?

> Institutions merely owning a newly-IPO'd stock means nothing. They get access to shares at a reasonable price before opening while retail is buying at insane prices after open. See Figma as an example where institutional investors got it at $33/share and it ended the IPO day at $115/share with retail buying all the way up (including pops above that at like $127)

It also doesn't mean nothing - you have to go and analyze any given stock to make these kinds of claims on a per-IPO/equity basis. You also are ignoring traders and trading algorithms run by... big institutions and trading firms, and you're not accounting for volume or accounting for post-IPO purchases nor breaking those down by segment. In other words, you're just making stuff up.

Danox · 2026-05-21T16:18:49 1779380329

Insiders get the best price before retail. What is there not to understand?

ericmay · 2026-05-21T17:00:24 1779382824

I understand that, but what I'm not understanding is why this seems to be a concern. I suppose equity given to early employees is a problem too and they're just "dumping their bags on retail" after their lockup period expires?

Earlier stage investors take risk and are rewarded for that. Most companies go bankrupt and folks lose their principal. For the companies that are successful yea some go bust after IPO - so what? Are you against public markets or something? That would at least be an interesting discussion.

Google IPO'd in 2004 and returned from what I'm reading about 6,500% after IPO (and this was in 2024, so the gains have gone up much higher since then) and all of that was the bags dumped on retail. If someone wants to dump their 6,500% return on me I'll take them up on that all day every day and twice on Sunday.

Being cynical is a recipe for poverty.

danny_codes · 2026-05-22T03:02:05 1779418925

Retail includes people holding passive index funds.

skiing_crawling · 2026-05-19T19:10:27 1779217827

I use claude/gemini as my homepage now (I have to keep switching as these companies make "updates" that periodically render their models useless). Even if I want to search for simple things, I would rather have an LLM wade through the result and extract just the information I asked for. SEO, and now mountains of slop content have made this necessary. Only a matter of time before the SEO industry in large figures out how to game LLMs too, making them equally useless.

I already saw a article recently about how to set up a business domain which can reliably show up in a search result and dump overly positive reviews into anyone's context.