> llama.cpp which runs a 13 billion parameter model on a 6GB GPU
I think that's a typo there too, the 13B model needs like 10G of memory for 4 bits, it's the 7B one that fits into 6G. Well unless you do the split thing with some layers on the CPU I guess.
I think that's a typo there too, the 13B model needs like 10G of memory for 4 bits, it's the 7B one that fits into 6G. Well unless you do the split thing with some layers on the CPU I guess.