> llama.cpp which runs a 13 billion parameter model on a 6GB GPU I think that's ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		moffkalast on May 17, 2023 \| parent \| context \| favorite \| on: Numbers every LLM developer should know > llama.cpp which runs a 13 billion parameter model on a 6GB GPU I think that's a typo there too, the 13B model needs like 10G of memory for 4 bits, it's the 7B one that fits into 6G. Well unless you do the split thing with some layers on the CPU I guess.

DANmode on May 18, 2023 [–]

https://news.ycombinator.com/item?id=35937505

moffkalast on May 18, 2023 | [–]

Yeah that's the split layer mode I mentioned. With 6G one can do about 18 layers, which is less than half of the 40 total for 13B.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact