Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A noob speaking here. Why aren't there efforts to have a memory bank like structure where you attend to a sub set of codes depending on the key(at the attention level)? is this already done with the global attention mechanism (what is it even)?


There are k v optimisations, unsure if gemma works with them, I didn't try.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: