Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I actively follow the state of the art pre trained models on paperswithcode.com and Nlp progress. The state of the art (often outperforming BERT by far) is XLnet and sadly is from 2019. 2020 has been stagnating (except for the special case of generative tasks with GPT3) I have observed that zero researchers have tried to improve on top of XLnet. While BERT has had ~20 alternatives implementations that improve upon it. Researchers are often unaware of what is the current state of the art,this induce a lag in research progress.


"zero researchers have tried to improve on top of XLnet" I question this assertion.

In particular at least the Roberta model by Facebook is already improving significantly upon XLNet.


Are reformer/linformer more space-efficient or also inference-runtime improved?

For me the greatest trick is improved runtime compared to older seq/RNN techniques




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: