Not common in Silicon Valley, but much more common in the rest of the country.
There’s an archetype for bootstrapped tech businesses:
- highly vertical specific
- couple hundred million TAM
- founder started the business in their 30s and is now in their 40s
It’s a tensor stored in GPU memory to improve inference throughput. Check out the PagedAttention (which introduces vLLM) paper for how most systems implement it nowadays.
I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.
Interesting to note - we have no idea how much R1 cost to train.
To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.
They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.
We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.
There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.
It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.
LLMs are inherently bad at this due to tokenization, scaling, and lack of training on the task. Anthropic’s computer use feature has a specialized model for pixel-counting:
> Training Claude to count pixels accurately was critical. Without this skill, the model finds it difficult to give mouse commands. [1]
For a VLM trained on identifying bounding boxes, check out PaliGemma [2]
You may also be able to get the computer use API to draw bounding boxes if the costs make sense.
That said, I think the correct solution is likely to use a non-VLM to draw bounding boxes. Depends on the dataset and problem.
PaliGemma on computer use data is absolutely not good. The difference between a FT YOLO model and a FT PaliGemma model is huge if generic bboxes are what you need. Microsoft's OmniParser also winds up using a YOLO backbone [1]. All of the browser use tools (like our friends at browser-use [2]) wind up trying to get a generic set of bboxes using the DOM and then applying generative models.
PaliGemma seems to fit into a completely different niche right now (VQA and Segmentation) that I don't really see having practical applications for computer use.
Conditionally yes. There are many libraries that cannot be tree shaken for various reasons. Libraries typically need to stick to a subset of full JS to ensure that the code can be statically analyzed.
GraphQL is very powerful when combined with Relay. It’s useless extra bloat if you just use it like REST.
The difference between the two technologies is that LangChain was developed and funded before anyone know what to do with LLMs and GraphQL was internal tooling using to solve a real problem at Meta.
In a lot of ways, LangChain is a poor abstraction because the layer it’s abstracting was (and still is) in it’s infancy.
reply