Show HN: Shimmy – 5MB privacy-first, local alternative to Ollama (680MB)

MKuykendall · 2025-09-04T18:10:31 1757009431

Hey HN! I built this because I was tired of waiting 10 seconds for Ollama's 680MB binary to start just to run a 4GB model locally.

Quick demo - working VSCode + local AI in 30 seconds: curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/late... ./shimmy serve # Point VSCode/Cursor to localhost:11435

The technical achievement: Got it down to 5.1MB by stripping everything except pure inference. Written in Rust, uses llama.cpp's engine.

One feature I'm excited about: You can use LoRA adapters directly without converting them. Just point to your .gguf base model and .gguf LoRA - it handles the merge at runtime. Makes iterating on fine-tuned models much faster since there's no conversion step.

Your data never leaves your machine. No telemetry. No accounts. Just a tiny binary that makes GGUF models work with your AI coding tools.

Would love feedback on the auto-discovery feature - it finds your models automatically so you don't need any configuration.

What's your local LLM setup? Are you using LoRA adapters for anything specific?

carlos_rpn · 2025-09-04T19:08:29 1757012909

You may have noticed already, but the link to the binary is throwing a 404.

MKuykendall · 2025-09-04T20:18:41 1757017121

This should be fixed now!

sunscream89 · 2025-09-05T21:25:12 1757107512

How do I use it with ollama models?

MKuykendall · 2025-09-06T01:30:29 1757122229

To use Shimmy (instead of Ollama):

  1. Install Shimmy:
  cargo install shimmy
  2. Get GGUF models (same models you'd use with Ollama):
  # Download to ./models/ directory
  huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf --local-dir
   ./models/
  # Or use existing Ollama models from ~/.ollama/models/
  3. Start serving:
  ./shimmy serve
  4. Use with any OpenAI-compatible client at http://localhost:11435

sunscream89 · 2025-09-06T12:28:50 1757161730

I am trying to use ~/.ollama/models/, even linked it to ~/models. I don’t have phi-3, it may be possible none of my models are supported. It acts as though it sees nothing.

How do I know for sure it is checking ~/.ollama/models/ (if linking isn’t the right approach.)

MKuykendall · 2025-09-06T14:54:33 1757170473

I didn't have that path set to autodiscover; pull the newest version this is fixed now!!

homarp · 2025-09-04T18:15:19 1757009719

Nice, a rust tool wrapping llama.cpp

how does it differ from llama-server?

and from llama-swap?

MKuykendall · 2025-09-04T18:43:44 1757011424

Shimmy is designed to be "invisible infrastructure" - the simplest possible way to get local inference working with your existing AI tools. llama-server gives you more control, llama-swap gives you multi-model management.

  Key differences:
  - Architecture: llama-swap = proxy + multiple servers, Shimmy = single server
  - Resource usage: llama-swap runs multiple processes, Shimmy = one 50MB process
  - Use case: llama-swap for managing many models, Shimmy for simplicity

MKuykendall · 2025-09-04T18:52:54 1757011974

Shimmy is for when you want the absolute minimum footprint - CI/CD pipelines, quick local testing, or systems where you can't install 680MB of dependencies.

stupidgeek314 · 2025-09-05T02:26:53 1757039213

Windows Defender tripped this for me, calling it out as Bearfoos trojan. Most likely a false positive, but jfyi.

MKuykendall · 2025-09-05T13:17:18 1757078238

Try cargo install or intentionally exclude, unsigned Rust binaries will do this.

cat-turner · 2025-09-05T10:43:26 1757069006

looks cool, ty! really great project will try this out.