I actively follow the state of the art pre trained models on paperswithcode.com and Nlp progress.
The state of the art (often outperforming BERT by far) is XLnet and sadly is from 2019.
2020 has been stagnating (except for the special case of generative tasks with GPT3)
I have observed that zero researchers have tried to improve on top of XLnet.
While BERT has had ~20 alternatives implementations that improve upon it.
Researchers are often unaware of what is the current state of the art,this induce a lag in research progress.