More

fotcorn · 2026-03-30T13:24:35 1774877075

Also, there is zero reason to think that the big labs did not have anything similar to TurboQuant for a long time already.

The recent blog post from Google announcing TurboQuant does not change anything regarding RAM planning for the big labs.

TurboQuant itself is already a year old! So even smaller labs have probably seen and implemented it.

scw · 2026-03-30T14:09:14 1774879754

TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.

lostmsu · 2026-03-30T14:53:59 1774882439

In large providers KV caches are the main bottleneck, no?

schmidtleonard · 2026-03-30T13:37:01 1774877821

The open source tooling got quantization support 3 years ago! It was a lesser type of quantization, but more than enough to prove that the savings just go to bigger models.

fotcorn · 2026-03-27T09:05:06 1774602306

Cheapest i know if is around $96k

fotcorn · 2026-03-03T15:00:04 1772550004

The memory bandwith on M4 Max is 546 GB/s, M5 Max is 614GB/s, so not a huge jump.

The new tensor cores, sorry, "Neural Accelerator" only really help with prompt preprocessing aka prefill, and not with token generation. Token generation is memory bound.

Hopefully the Ultra version (if it exists) has a bigger jump in memory bandwidth and maximum RAM.

anentropic · 2026-03-03T15:42:06 1772552526

Do any frameworks manage to use the neural engine cores for that?

Most stuff ends up running Metal -> GPU I thought

abhikul0 · 2026-03-03T16:44:44 1772556284

It's referring to the neural cores(for matrix mul) in the GPU itself, not the NPU.

https://creativestrategies.com/research/m5-apple-silicon-its...

sumek83 · 2026-03-03T16:51:12 1772556672

https://github.com/maderix/ANE

irusensei · 2026-03-04T09:33:29 1772616809

I noticed that even on my M3 MLX tends to do prefill it a lot faster than llama.cpp and GGML models. Anyone knows how they do it?

fotcorn · 2026-02-14T21:41:11 1771105271

Related to this, how do you get your comments that you add in the review back into your agent (Claude Code, Cursor, Codex etc.)? Everybody talks about AI doing the code review, but I want a solution for the inverse - I review AI code and it should then go away and fix all the comments, and then update the PR.

voidUpdate · 2026-02-17T10:08:07 1771322887

What you do is actually read the comments, think about how you can improve the code, and then improve it, whether by telling the agent to do that or doing it yourself

ctmnt · 2026-02-15T01:23:22 1771118602

There’s a bunch of versions of this out there. This one’s mine, but it’s based on other ones. It works really well. It assesses the validity and importance of each comment, then handles it appropriately, creating issues, fixing the code, adding comments, updating the GH Copilot instructions file, etc.

https://github.com/cboone/cboone-cc-plugins/blob/main/plugin...

alehlopeh · 2026-02-14T22:58:02 1771109882

I tell claude code “review the comments on this PR” and give it the url, and that’s enough. It then uses the gh cli tool and fetches the PR and individual comments.

somesortofthing · 2026-02-14T21:43:31 1771105411

I suspect you don't need anything special for this. The GH API has support for reading comments from PRs. Maybe have it maintain a small local store to remember the IDs of the comments it's already read so it doesn't try to re-implement already-implemented fixes. Another similar thing you can do is a hook that reminds it to start a subagent to monitor the CI/autofix errors after it creates/updates a PR.

danappelxx · 2026-02-14T22:15:55 1771107355

GitHub API is actually quite tricky here because there is a different between “comment” and “review” and “review comment” (paraphrasing, I don’t remember the details). So it’s not as simple as one API call that grabs the markdown. Of course you can write a creative one-liner to extract what you need, though.

epolanski · 2026-02-15T00:32:56 1771115576

I don't use it, but you can tag @copilot on GitHub comments and it will do so.

I don't do it because the chances of me reviewing vomited code are close to 0.

fotcorn · 2026-01-20T15:22:41 1768922561

I used Claude Opus 4.5 inside Cursor to write RISC-V Vector/SIMD code. Specifically Depthwise Convolution and normal Convolution layers for a CNN.

I started out by letting it write a naive C version without intrinsic, and validated it against the PyTorch version.

Then I asked it (and two other models, Gemini 3.0 and GPT 5.1) to come up with some ideas on how to make it faster using SIMD vector instructions and write those down as markdown files.

Finally, I started the agent loop by giving Cursor those three markdown files, the naive C code and some more information on how to compile the code, and also an SSH command where it can upload the program and test it.

It then tested a few different variants, ran it on the target (RISC-V SBC, OrangePI RV2) to check if it improves runtime, and then continue from there. It did this 10 times, until it arrived at the final version.

The final code is very readable, and faster than any other library or compiler that I have found so far. I think the clear guardrails (output has to match exactly the reference output from PyTorch, performance must be better than before) makes this work very well.

sifar · 2026-01-20T15:59:49 1768924789

I am really surprised by this. While I know it can generate correct SIMD code, getting a performant version is non trivial, especially for RVV, where the instruction choices and the underlying micro architecture would significantly impact the performance.

IIRC, Depthwise is memory bound so the bar might be lower. Perhaps you can try some thing with higher compute intensity like a matrix multiply. I have observed, it trips up with the columnar accesses for SIMD.

fotcorn · 2026-01-22T11:51:22 1769082682

I think the ability to actually run the code on the target helped a lot with understanding and optimizing for the specific micro architecture. Quite a few of the ideas turned out to not to be optimal and were discarded.

Also important to have a few test cases the agent can quickly check against, it will often generate wrong code, but if that is easily detectable the agent can fix it and continue quickly.

camel-cdr · 2026-01-20T16:06:41 1768925201

can you share the code?

fotcorn · 2025-12-01T15:09:48 1764601788

Seems like the hiker at the bottom of the article was introduced in 1997 and removed only in 2017: https://s.geo.admin.ch/be66brq5oby9

fotcorn · 2025-10-27T18:01:39 1761588099

I have the PCIe version of NanoKVM, and I am also happy with it.

The big advantage of the PCIe version is that it does not take up space on the desk and all the cables for ATX power control an inside the PC case.

Full-sized HDMI is nice, the only limitation here is 1080p resolution. 1440p or higher would allow mirroring the output on the main monitor to the NanoKVM, but this probably a weird use-case anyway.

fotcorn · 2025-07-22T22:26:49 1753223209

It says that there are multiple sizes in the second sentence of the huggingface page: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct

You won't be out of work creating ggufs anytime soon :)

danielhanchen · 2025-07-22T22:32:26 1753223546

fotcorn · 2025-07-13T12:20:05 1752409205

The VPN product is very good, it's basically a thin wrapper around Mullvad, arguably the best VPN on the planet right now. At least from a privacy standpoint.

wkat4242 · 2025-07-13T14:31:40 1752417100

It's not anymore. They blocked port forwarding which interferes with torrents, are moving away from OpenVPN which I need.

In my opinion they are well on their way of enshittification and I moved to protonvpn.

fotcorn · 2025-05-11T12:34:27 1746966867

Why do you think it's a negative result? The table on page 9 shows great results.

ogogmad · 2025-05-11T13:28:20 1746970100

I think it's a pun. AlphaZero? AlphaNegative.

andy_ppp · 2025-05-11T14:38:36 1746974316

-273°C isn’t it?