Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe only for their own models


Now any Google customer can use Trillium for training any model?


[Google employee] Yes, you can use TPUs in Compute Engine and GKE, among other places, for whatever you'd like. I just checked and the v6 are available.


Is there not goin to be a v6p?


Can't speculate on futures, but here's the current version log ... https://cloud.google.com/tpu/docs/system-architecture-tpu-vm...


Google trained Llama-2-70B on Trillium chips


I thought llama was trained by meta.


> Google trained Llama

Source? This would make quite the splash in the market


It's in the article: "When training the Llama-2-70B model, our tests demonstrate that Trillium achieves near-linear scaling from a 4-slice Trillium-256 chip pod to a 36-slice Trillium-256 chip pod at a 99% scaling efficiency."


I'm pretty sure they're doing fine-tune training, using Llama because it is a widely known and available sample. They used SDXL elsewhere for the same reason.

Llama 2 was released well over a year ago and was training between Meta and Microsoft.


They can just train another one.


Llama 2 end weights are public. The data used to train it, or even the process used to train it, are not. Google can't just train another Llama 2 from scratch.

They could train something similar, but it'd be super weird if they called it Llama 2. They could call it something like "Gemini", or if it's open weights, "Gemma".


The article says they used maxtext to load the weights and pretrain on additional data. It looks like the instructions for doing that are here: https://github.com/AI-Hypercomputer/maxtext/blob/main/gettin...


They don't mean literally LLaMA. They mean a model with the same architecture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: