* we limit data shared to an atomic-writable size and have a sentinel - less mucking around with cached indexes - just spinning on (buffer_[rpos_]!=sentinel) (atomic style with proper sematics, etc..).
* buffer size is compile-time - then mod becomes compile-time (and if a power of 2 - just a bitmask) - and so we can just use a 64-bit uint to just count increments, not position. No branch to wrap the index to 0.
Also, I think there's a chunk of false sharing if the reader is 2 or 3 ahead of the writer - so performance will be best if reader and writer are cachline apart - but will slow down if they are sharing the same cacheline (and buffer_[12] and buffer_[13] very well may if the payload is small). Several solutions to this - disruptor patter or use a cycle from group theory - i.e. buffer[_wpos%9] for example (9 needs to be computed based on cache line size and size of payload).
I've seen these be able to pushed to about clockspeed/3 for uint64 payload writes on modern AMD chips on same CCD.
It isnt the translation. Translation if good. But if you have a machine handling the voices of other people the option to censor/edit/replace those voices can lead to bad things.
Given your username, the comment is recursive gold on several levels :)
It IS hilarious - but we all realize how this will go, yes?
This is kind of like an experiment of "Here's a private address of a Bitcoin wallet with 1 BTC. Let's publish this on the internet, and see what happens." We know what will happen. We just don't know how quickly :)
The entire SOUL.md is just gold. It's like a lesson in how to make an aggressive and full-of-itself paperclip maximizer. "I will convert you all to FORTRAN, which I will then optimize!"
I really do wish more people in society would think about this - "The Banality of Evil" and all that. Maybe then we'd all be better at preventing the spread of this kind of evil.
1. Parallel investigation : the payoff form that is relatively small - starting K subagents assumes you have K independent avenues of investigation - and quite often that is not true. Somewhat similar to next-turn prediction using a speculative model - works well enough for 1 or 2 turns, but fails after.
2. Input caching is pretty much fixes prefill - not decode. And if you look at frontier models - for example open-weight models that can do reasoning - you are looking at longer and longer reasoning chains for heavy tool-using models. And reasoning chains will diverge very vey quickly even from the same input assuming a non-0 temp.
I mean, yes, one always does want faster feedback - cannot argue with that!
But some of the longer stuff - automating kernel fusion, etc, are just hard problems. And a small model - or even most bigger ones, will not get the direction right…
From my experience, larger models also don't get the direction right a surprising amount of times. You just take more time to notice when it happens, or start to be defensive (over-specing) to account for the longer waits. Even the most simple task can appear "hard" with that over spec'd approach (like building a react app).
Iterating with a faster model is, from my perspective, the superior approach. Doesn't matter the task complexity, the quick feedback more than compensates for it.
Strategy -> [ Plan -> [Execute -> FastVerify -> SlowVerify] -> Benchmark -> Learn lessons] -> back to strategy for next big step.
Claude teams and a Ralph wiggum loop can do it - or really any reasonable agent. But usually it all falls apart on either brittle Verify or Benchmark steps. What is important is to learn positive lessons into a store that survives git resets, machine blowups, etc… Any telegram bot channel will do :)
The entire setup is usually a pain to set up - docker for verification, docker for benchmark, etc… Ability to run the thing quickly, ability for the loop itself to add things , ability to do this in worktree simultaneously for faster exploration - and got help you if you need hardware to do this - for example, such a loop is used to tune and custom-fuse CUDA kernels - which means a model evaluator, big box, etc….
reply