turbo_wombat's comments

turbo_wombat · 2025-09-20T05:59:14 1758347954

One of the big changes in the post war era was that immigration was massively opened up in 1965. From 1924 to 1965 the US had very restrictive immigration laws, which led to labor shortages, which allowed unions to become strong, rising wages and the expansion of the middle class. Since 1965 we've had declining union participation.

This is simple supply and demand. If you restrict the labor supply, the value of labor increases.

The same thing was observed after the Black Death, which killed off 30 to 50% of Europe's population. There were labor shortages, which increased the bargaining power of labor, and increased wages.

It's really funny US companies suddenly start pretending they don't believe in supply and demand when it comes to labor.

incone123 · 2025-09-20T07:16:12 1758352572

Britain tried to impose wage controls after the black death. Results were mixed. https://en.m.wikipedia.org/wiki/Statute_of_Labourers_1351

turbo_wombat · 2025-09-03T19:59:31 1756929571

The original version of Ultima I was written in a mixture of BASIC and assembly. BASIC is pretty slow, but most BASIC implementations let you call into optimized assembly routines.

Though, past a certain point of complexity, performance aside, assembly might be more readable than BASIC because BASIC relied on line numbers for jumping around, whereas assemblers offered named labels.

CamperBob2 · 2025-09-03T22:46:19 1756939579

The original version of Ultima I was written in a mixture of BASIC and assembly.

True that. Who could forget "One moment for house-cleaning!" as it ran a FRE(0) every time you saved.

(... uh, everybody?)

turbo_wombat · 2025-09-03T19:48:35 1756928915

They are comparing unoptimized PyTorch inference, something you would never deploy on a device, to a model with custom kernels.

Yes, of course the model with custom kernels is faster, whether it's written by a human or an AI.

Generally, PyTorch inference is meant to be used during the training process, and when running metrics, not when deploying. When deployed, you should export to ONNX, and then compile the ONNX to the native format of the device.

If you aren't familiar with the pipeline for ML deployment, this is the equivalent of comparing interpreted code to compiled code.

nserrino · 2025-09-03T21:15:56 1756934156

PyTorch is the baseline because that's what people prototype in, and the most common reference point. The aim here is to show that you can start from prototype code and automatically produce lower-level kernels (in this case Metal) that are more usable in real deployments, without additional work from the developer. Frontier models are capable at generating efficient Metal kernels automatically/immediately, and will only get better. We expect to see significant improvements as we refine the approach, but it's enough to show this seems to be a tractable problem for AI.

CapsAdmin · 2025-09-03T21:34:05 1756935245

I have never really worked with pytorch professionally, but it feels to me a lot of the open source, especially generative oriented projects, just use pytorch like this. It makes hacking on the models a whole lot easier.

comfyui is a good example of a project like this.

yieldcrv · 2025-09-04T00:24:46 1756945486

> Yes, of course the model with custom kernels is faster, whether it's written by a human or an AI.

But that’s the thing, I wouldn’t write a custom kernel before AI

I don't do that level of development or operate at that part of the stack but I’m very experienced in software development

AI significantly augments my skillsets in this area

am17an · 2025-09-04T06:01:48 1756965708

The point is those kernels exist already, you can just use them off the shelf. In the case where you're trying to write a production grade kernel without operating at that part of the stack... well good luck with that.

airforce1 · 2025-09-03T22:37:28 1756939048

> and then compile the ONNX to the native format of the device.

I'm assuming you are talking about https://github.com/onnx/onnx-mlir?

In your experience, how much faster is a "compiled" onnx model vs. using an onnx runtime?

dapperdrake · 2025-09-03T23:28:26 1756942106

For other people reading this:

Back in the day TensorFlow had tfdeploy which compiled TensorFlow terms into NumPy matrix operations. Our synthetic tests saw speedups of factor 50.

spott · 2025-09-04T21:23:20 1757021000

vLLM is a LLM model serving framework written using raw PyTorch.

ONNX doesn’t support a bunch of operations that PyTorch does (it isn’t always possible to convert a PyTorch model to ONNX).

Torchserve runs raw PyTorch.

Generally speaking, PyTorch is pretty well optimized. For Mac it has been historically ignored, so the kernels for MPS were all missing or just bad, but on CUDA and Linux they are pretty good.

turbo_wombat · 2025-08-22T21:50:39 1755899439

You are asking why save Intel of all chip manufacturers, and the answer is because there aren't any other major chip manufacturers in the US.

AMD no longer has a fab. TSMC dominates the global market and basically has no competition.

In the event that Taiwan is invaded, the US would suddenly have a huge problem getting access to any kind of high end chips, be they CPUs or GPUs. This would be a major problem economically and militarily for the US.

Some caveats: Due to the chip act, TSMC does now have fabs Arizona, though I'm not sure what their capacity is. TI, and some others building lower end components also have fabs I believe. For x86, high end ARM, and GPU's, virtually all of that is manufactured by TSMC right now, mostly in Taiwan.

internetter · 2025-08-22T22:40:02 1755902402

> TSMC does now have fabs Arizona, though I'm not sure what their capacity is.

180,000 wafers a year. Globally they do 17 million. They announced first profit yesterday.

SJC_Hacker · 2025-08-24T05:16:43 1756012603

In the event that Taiwan is invaded, the EVERYONE would suddenly have a huge problem getting access to any kind of high end chips, be they CPUs or GPUs.

China would not takeover TSMC intact. Even if they did, they would not be able to operate it for quite some time (years), if ever.