I have one, and I love it. That said my buddies Mac smokes it for inference workloads in terms of tokens per second AND its more usable for other things.
If you are training and doing research it's great, if you want to cluster them it cant be beat, but if you just want local inference on a single box buy a mac or even a strix halo device.
Get your buddy with his smoking Mac to allow multiple concurrent connections and see how it gets on compared to your Spark. Don't ever use a single "chat" test to derive performance - try running say 10 or more.
You might also notice that your Spark has a pair of QSFP28 or DD (not sure yet) type interfaces as well as the 10Gb/s ethernet - that network card is a right old beast and adds quite a lot to the cost. It is capable of either 200 or 400Mb/s and can be split into two lots of four. Your mate's Mac probably has a wifi connection and is too cool for ethernet 8)
That NIC is there for a good reason - the Spark wants some friends to cluster with and you will absolutely spank any Mac when you spaff Mac style money on say three of these beasts and some cables and cluster them up. If you want four or more, you will need a switch and Mikrotik and others have them.
Casual "tokens per second" in AI is a bit like gamers whittering on about "ping" when they are using TCP and UDP for their games. ICMP request/response is a handy way of testing network paths and can give some indications towards potential performance limitations.
can those macs boot linux? i've heard about Asahi but have no idea how far along they are. i've got my fleet configured with nix and sure, nix can target darwin, but there's a _lot_ of sharp edges there: i don't really want to pull that thread unless i have to...
I don't know. I think he just uses LMStudio most of the time on his, but that's one place I can say the spark really shines for me.
I'm a Linux guy, but also don't always have alot of time. The Spark comes out of the box with a nice Linux distro that's pre-configured to be easy to setup and the guides and online resources make getting up and running trivial, for even some complex tasks. You would have to do a LOT of tinkering just to figure out some of the things the nvidia resources walk you through natively. They have guides for a ton of stuff that include the optimal settings so you don't have to figure it all out through trial and error.
Check out these "playbooks" for some examples. [0] There's a lot to be said for not having to piece all that together yourself.
Not the new ones. Only the M1 and M2 have good support for Asahi. But you really don't need it. If you need Linux, use a VM (UTM is free and is equivalent to KVM/QEMU in speed, despite being a Type-2 Hypervisor.)
pretty much any of them, dude, as long as you have enough RAM, since it uses unified RAM and a powerful SoC CPU/GPU. Literally any M-class model, but the M5 is currently top tier.
The DGX Spark has basically the same memory bandwidth as a M5 Pro, and far more than a M5.
Only the M3 Ultra really beats it, and once you start scoping out the cost of a M3 Ultra with 128GB or 256GB, the DGX Spark doesn’t look bad after all.
> The DGX Spark has basically the same memory bandwidth as a M5 Pro, and far more than a M5.
I see ~274 GB/sec for the DGX Spark[1], versus 307 GB/sec for M5 Pro and 460 or 614 GB/sec for M5 Max[2]. One might call 90% "basically the same", but there are nominally two tiers above "Pro".
Yes, a MacBook Pro with 128 GB and M5 Max costs $5100 (14") or $5400 (16") versus currently $4700 for the DGX Spark, but the MBP includes keyboard, mouse, battery and portability. I believe its prefill is slower and you get 2 TB vs 4 TB SSD, but overall one gives up a lot to save 10% of the cost.
I looked, but a sibling comment just provided the links. ~274 GB/sec for the DGX Spark, vs. 307 GB/sec for M5 Pro, and max 614 GB/sec (!!!) for M5 Max? Why would you completely friggin’ lie about this, or at minimum, not double-check your facts before bullshitting? Plus, you get a full-fledged computer along with it!
Apple could actually be a good deal and you folks would still make up something to not justify it. In a way, it’s amazing what Apple has accomplished- Baseless negatively-tainted perception in certain influential tech circles.
(To be fair, they’re kind of earning it. I’m glad Tim “Sweet T” Cook is departing.)
Plus, my original comment got downvoted despite being factually-correct. Thanks, Reddit. Oh, wait…
Yep. Memory bandwidth is what decides how fast LLM's generate tokens (mostly). The DGX Spark has something like 270 GB/s of memory bandwidth, and the m5 ultra is ~615 GB/s. Theoretically DOUBLE the speed. In practice he only generates like 25% more tok/s, but that's still very impressive.
The spark can fine tune models in 1/4 the time and excels at other compute tasks in ways that Mac never can. Plus the high bandwidth ConnectX-7 ports would be like $1700 to buy on a card just for the network adapters... But for generating tokens, it just plain loses.
In case anyone was wondering my spark is basically silent as well. It's great at being ignored, if that's really important to you. I've run mine completely headless since I bought it, including setup.
If you are training and doing research it's great, if you want to cluster them it cant be beat, but if you just want local inference on a single box buy a mac or even a strix halo device.