Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Gemma-3n-E4B-it on my 2022 Galaxy Z Fold 4.

CPU:

7.37 seconds to first token

35.55 tokens/second prefill speed

7.09 tokens/second decode speed

27.97 seconds to complete the answer

GPU:

1.96 seconds to first token

133.40 tokens/second prefill speed

7.95 tokens/second decode speed

14.80 seconds to complete the answer



So a apparently the NPU can't be used for models like this. I wonder what it is even good for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: