More

tiffanyh · 2026-02-19T14:46:16 1771512376

Lua is designed for the use case of being embedded.

> "Lua: an extensible embedded language"

https://www.lua.org/ddj.html

tiffanyh · 2026-02-19T03:31:21 1771471881

In enterprise software, this is an embedded/OEM use case.

And historically, embedded/OEM use cases always have different pricing models for a variety of reasons why.

How is this any different than this long established practice?

ziml77 · 2026-02-19T03:39:32 1771472372

It's not, but do you really think the people having Claude build wrappers around Claude were ever aware of how services like this are typically offered.

tiffanyh · 2026-02-19T00:47:27 1771462047

Super interesting work.

Q: how is your AAP different than the industry work happening on Intent/Instructions.

alexgarden · 2026-02-19T01:02:20 1771462940

The short version: instructions tell the model what to do. An Alignment Card declares what the agent committed to do — and then a separate system verifies it actually did.

Most intent/instruction work (system prompts, Model Spec, tool-use policies) is input-side. You're shaping behavior by telling the model "here are your rules." That's important and necessary. But it's unverifiable — you have no way to confirm the model followed the instructions, partially followed them, or quietly ignored them.

AAP is an output-side verification infrastructure. The Alignment Card is a schema-validated behavioral contract: permitted actions, forbidden actions, escalation triggers, values. Machine-readable, not just LLM-readable. Then AIP reads the agent's reasoning between every action and compares it to that contract. Different system, different model, independent judgment.

Bonus: if you run through our gateway (smoltbot), it can nudge the agent back on course in real time — not just detect the drift, but correct it.

So they're complementary. Use whatever instruction framework you want to shape the agent's behavior. AAP/AIP sits alongside and answers the question instructions can't: "did it actually comply?"

tiffanyh · 2026-02-19T01:41:55 1771465315

> Then AIP reads the agent's reasoning between every action and compares it to that contract.

How would this work? Is one LLM used to “read” (and verify) another LLMs reasoning?

alexgarden · 2026-02-19T02:39:29 1771468769

Yep... fair question.

So AIP and AAP are protocols. You can implement them in a variety of ways.

They're implemented on our infrastructure via smoltbot, which is a hosted (or self-hosted) gateway that proxies LLM calls.

For AAP it's a sidecar observer running on a schedule. Zero drag on the model performance.

For AIP, it's an inline conscience observer and a nudge-based enforcement step that monitors the agent's thinking blocks. ~1 second latency penalty - worth it when you must have trust.

For both, they use Haiku-class models for intent summarization; actual verification is via the protocols.

tiffanyh · 2026-02-19T03:00:05 1771470005

Dumb question: don’t you eventually need a way to monitor the monitoring agent?

If a second LLM is supposed to verify the primary agent’s intent/instructions, how do we know that verifier is actually doing what it was told to do?

alexgarden · 2026-02-19T03:23:10 1771471390

Not a dumb question — it's the right one. "Who watches the watchmen" has been on my mind from the start of this.

Today the answer is two layers:

The integrity check isn't an LLM deciding if it "feels" like the agent behaved. An LLM does the analysis, but the verdict comes from checkIntegrity() — deterministic rule evaluation against the Alignment Card. The rules are code, not prompts. Auditable.

Cryptographic attestation. Every integrity check produces a signed certificate: SHA-256 input commitments, Ed25519 signature, tamper-evident hash chain, Merkle inclusion proof. Modify or delete a verdict after the fact, and the math breaks.

Tomorrow I'm shipping interactive visualizations for all of this — certificate explorer, hash chain with tamper simulation, Merkle tree with inclusion proof highlighting, and a live verification demo that runs Ed25519 verification in your browser. You'll be able to see and verify the cryptography yourself at mnemom.ai/showcase.

And I'm close to shipping a third layer that removes the need to trust the verifier entirely. Think: mathematically proving the verdict was honestly derived, not just signed. Stay tuned.

tiffanyh · 2026-02-19T03:33:10 1771471990

Appreciate all you’re doing in this area. Wishing you the best.

alexgarden · 2026-02-19T04:29:43 1771475383

You're welcome - and thanks for that. Makes up for the large time blocks away from the family. It does feel like potentially the most important work of my career. Would love your feedback once the new showcase is up. Will be tomorrow - preflighting it now.

tiffanyh · 2026-02-18T15:53:20 1771430000

Just something to keep in mind…

Testifying before Congress is brutally stressful. Even the most prepared CEOs can freeze up, lose their train of thought, or misspeak under that kind of pressure.

And the media often hunts for “gotcha” lines without acknowledging how easy it is to make an unintentional misstatement in that setting.

Note: I’m not weighing in on whether Zuckerberg’s statements were accurate ... I’m just pointing out the pressure dynamic that often gets overlooked.

consp · 2026-02-18T16:23:01 1771431781

He also gets the best training money can buy and CEOs make enough to justify them giving a perfect presentation. It's not like you ask some lone employee to do it. The proverbial buck stops somewhere.

NickC25 · 2026-02-18T16:46:04 1771433164

He also gets the softest questions money can buy, because he's bought practically every congressperson.

tiffanyh · 2026-02-17T19:54:21 1771358061

I really like @mitchellh perspective on this topic of moving off GitHub.

---

> If you're a code forge competing with GitHub and you look anything like GitHub then you've already lost. GitHub was the best solution for 2010. [0]

> Using GitHub as an example but all forges are similar so not singling them out here This page is mostly useless. [1]

> The default source view ... should be something like this: https://haskellforall.com/2026/02/browse-code-by-meaning [2]

[0] https://x.com/mitchellh/status/2023502586440282256#m

[1] https://x.com/mitchellh/status/2023499685764456455#m

[2] https://x.com/mitchellh/status/2023497187288907916#m

Starlevel004 · 2026-02-17T21:06:35 1771362395

Person who pays for AI: We should make everything revolve around the thing I pay for

nine_k · 2026-02-18T00:49:01 1771375741

The amount of inference required for semantic grouping is small enough to run locally. It can even be zero if semantic tagging is done manually by authors, reviewers, and just readers.

techcode · 2026-02-18T08:39:59 1771403999

Where did "AI for inference" and "semantic tagging" come from in this discussion? Typically for code repositories - AIs/LLMs are doing reviews/tests/etc, not sure what/where semantic tagging fits? Even do be done manually by humans.

And besides that - have you tried/tested "the amount of inference required for semantic grouping is small enough to run locally."?

While you can definitely run local inference on GPUs [even ~6 years old GPUs and it would not be slow]. Using normal CPUs it's pretty annoyingly slow (and takes up 100% of all CPU cores). Supposedly unified memory (Strix Halo and such) make it faster than ordinary CPU - but it's still (much) slower than GPU.

I don't have Strix Halo or that type of unified memory Mac to test that specifically, so that part is an inference I got from an LLM, and what the Internet/benchmarks are saying.

resonious · 2026-02-18T02:12:30 1771380750

The stuff he says in [1] completely does not match my usage. I absolutely do use fork and star. I use release. I use the homepage link, and read the short description.

I'm also quite used to the GitHub layout and so have a very easy time using Codeberg and such.

I am definitely willing to believe that there are better ways to do this stuff, but it'll be hard to attract detractors if it causes friction, and unfamiliarity causes friction.

rtpg · 2026-02-17T23:22:05 1771370525

I really don't get this... like you're a code checkout away from just asking claude locally. I get that it is a bit more extra friction but "you should have an agent prompt on your forge's page" is a _huge_ costly ask!

I say this as someone who does browse the web view for repos a lot, so I get the niceness of browsing online... but even then sometimes I'm just checking out a repo cuz ripgrep locally works better.

hparadiz · 2026-02-18T01:05:10 1771376710

This looks like a confusing mess to me.

blibble · 2026-02-17T20:23:47 1771359827

for [1] he's right for his specific use case

when he's working on his own project, obviously he never uses the about section or releases

but if you're exploring projects, you do

(though I agree for the tree view is bad for everyone)

mbreese · 2026-02-17T21:29:58 1771363798

I also check for the License of a project when I'm looking at a project for the first time. I usually only look at that information once, but it should be easily viewed.

I also look for releases if it's a program I want to install... much easier to download a processed artifact than pull the project and build it myself.

But, I think I'm coming around to the idea that we might need to rethink what the point of the repository is for outside users. There's a big difference in the needs of internal and external users, and perhaps it's time for some new ideas.

(I mean, it's been 18 years since Github was founded, we're due for a shakeup)

crabmusket · 2026-02-18T04:45:48 1771389948

Hrm. Mitchell has been very level-headed about AI tools, but this seems like a rare overstep into hype territory.

"This new thing that hasn't been shipped, tested, proven, in a public capacity on real projects should be the default experience going forwards" is a bit much.

I for one wouldn't prefer a pre-chewed machine analysis. That sounds like an interesting feature to explore, but why does it need to be forced into the spotlight?

bastardoperator · 2026-02-17T22:15:20 1771366520

Crazy... https://github.com/ghostty-org/ghostty

Rapzid · 2026-02-18T02:06:57 1771380417

Oh FFS. Twitter really brings out the worst in people. Prefer the more deeply insightful and measured blog posting persona.

pojntfx · 2026-02-18T00:03:49 1771373029

Aren't they literally moving off GitHub _because_ of LLMs and the enshittification optimising for them causes? This line of thinking and these features seem to push people _off_ your platform, not onto it.

tiffanyh · 2026-02-14T15:09:09 1771081749

https://minifeed.net is another similar site that I’ve enjoyed.

tiffanyh · 2026-02-13T01:09:10 1770944950

If history has taught us anything, “engineered systems” (like mainframes & hyper converged infrastructure) emerge at the start of a new computing paradigm … but long-term, commodity compute wins the game.

alecco · 2026-02-13T09:23:22 1770974602

Chips and RAM grew in capacity but latency is mostly flat and interconnect power consumption grew a lot. So I think the paradigm changed. Even with newer ones like NVlink.

For 28 years Intel Xeon chips come with massive L2/L3. Nvidia is making bigger chips with last being 2 big chips interconnected. Cerebras saw the pattern and took it to the next level.

And the technology is moving 3D towards stacking layers on the wafer so there is room to grow that way, too.

pjs_ · 2026-02-13T05:59:00 1770962340

I think that was true when you could rely on good old Moore’s law to make the heavy iron quickly obsolete but I also think those days are coming to an end

tiffanyh · 2026-02-11T21:47:51 1770846471

Do similar issues exist with Gemini on Android?

Or are these challenges very Siri/iOS specific?

shakna · 2026-02-12T01:16:51 1770859011

Gemini can and does send everything to Google.

Apple's challenge is they want to maintain privacy, which means doing everything on-device.

Which is currently slower than the servers that others can bring to the table - because they already grab every piece of data you have.

AnonHP · 2026-02-12T03:05:53 1770865553

> Apple's challenge is they want to maintain privacy, which means doing everything on-device.

Apple is not trying to do everything on-device, though it prefers this as much as possible. This is why it built Private Cloud Compute (PCC) and as I understand it, it’s within a PCC environment that Google’s Gemini (for Apple’s users) will be hosted as well.

ComputerGuru · 2026-02-12T03:06:08 1770865568

This isn't planned to be exclusively on-device. Siri isn't exclusively on-device now, to begin with.

pjmlp · 2026-02-12T09:55:09 1770890109

No they aren't, that is why Private Cloud Compute is a thing.

tiffanyh · 2026-02-10T23:38:08 1770766688

If used, will you now have to be PCI-compliant?

prasoonds · 2026-02-11T03:52:23 1770781943

Hey, no. We are not dealing with raw card numbers. It’s just a layer on top of the existing stripe SDK that makes using stripe easier. PCI compliance does not kick in here.

tiffanyh · 2026-02-10T16:19:47 1770740387

Creating Acceptance is super difficult.

Hence why crypto hasn't taken off with merchants. Because who's going to pay for merchants to change their point-of-sale systems to accept a new payment method.

autoexec · 2026-02-10T17:28:08 1770744488

If the entirety of Europe comes up with a single system I think that'll be more than enough incentive for merchants to update their pos software to accept the new network. I hope that they are eventually so successful that merchants here in the US support them too. I'd love to stop using visa and mastercard.

tiffanyh · 2026-02-10T18:17:22 1770747442

You mean like Pix in Brazil, or UPI in India.

scotty79 · 2026-02-10T21:06:01 1770757561

If we could create a single solution on Europan level, based on cellphones first and order banks to provide service of access to it for all of their customers, free of charge, for the privilege of remaining in the market, it could be done.

direwolf20 · 2026-02-10T17:29:05 1770744545

Crypto is also a shit payment method though. Expensive and difficult to run and with high transaction fees. And if you use a chain with low transaction fees, there's no consensus on which chain that is (otherwise transaction fees would be high) so you have to support all of them. Then you might as well outsource the whole thing.

SenorKimchi · 2026-02-11T08:38:37 1770799117

I have done some work at crypto exchanges so I am a bit biased.

I would agree that BTC and many assets are a terrible payment method due to poor UX (block time, clunky wallets), speculation, and wild price swings. But crypto in general works well for payments I would say.

Transaction fees have improved significantly where it can be on the order of a few cents per transaction. So yes this is a little high for a $1 candy bar but this is fantastic for a $1,000 watch.

The number of chains and interoperability is a bit of a pain at the moment, but this problem can be resolved by delegating to a payment processor, or simply targetting ETH, the top stablecoins, and BTC which account for the vast majority of the market.

> Expensive and difficult to run

Again I am biased because of my experience, but I could set up a payment gateway for ETH in a few hours using free public nodes at virtually no cost. No business overhead. No agreements with payment processors or card companies. The biggest cost and overhead ends up being accounting, because crypto still has ill defined laws and regulation.