Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure here. The LLaMA models - yes, all weights fit in the small range between -2.0 .. 2.0

And some other models have more crazy numbers with even more crazier outliers within them, like you might have a weight of 12.00 between long array of typical small numbers around 0.00

I've read story about attempt to quantize RWKV model into the 4/5 bits which failed short due to the presence of outlier weights.

The author told somewhere that bigger models had worse perplexity because of this.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: