Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> llama.cpp which runs a 13 billion parameter model on a 6GB GPU

I think that's a typo there too, the 13B model needs like 10G of memory for 4 bits, it's the 7B one that fits into 6G. Well unless you do the split thing with some layers on the CPU I guess.




Yeah that's the split layer mode I mentioned. With 6G one can do about 18 layers, which is less than half of the 40 total for 13B.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: