> Running an LLM query through a GPU is very high latency: it may take, say, 5 s...

		diatone on May 18, 2023 \| parent \| context \| favorite \| on: Numbers every LLM developer should know > Running an LLM query through a GPU is very high latency: it may take, say, 5 seconds, with a throughput of 0.2 queries per second. Why?