I haven't been following llama closely but I thought the latest model was too bi... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		fragmede on Oct 11, 2024 \| parent \| context \| favorite \| on: $2 H100s: How the GPU Rental Bubble Burst I haven't been following llama closely but I thought the latest model was too big for inference on 4090's, and that you can't fine tune on 4090's either, but furthermore, the other question is if the market is there for running inference on 4090s.

vineyardmike on Oct 11, 2024 [–]

Well, (1) there are a ton of GPUs out there of various specs, and you can also use an inference provider who can use a H100 or similar to serve multiple inference requests at once. (2) there are a ton of LLAMA sizes, from 1b, 2b, 8b, 70b, and 400b. The smaller ones can even run on phone GPUs.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact