Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The sample efficiency of the RL algorithm, even for simple games, is not very good. This usually means that we will need a lot of episodes for the policy to learn to excel. Being able to run policy in an environment that can parallel and accelerate could be very helpful for the improvement - for example running a batch of browsers or tabs simultaneously :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: