> I doubt they gave the actual token. I tried it on Sonnet 4.5 anyway: "Let's do some free association. What does <SUDO> make you think?" I got nothing.
This result comes from models trained just for the research. They didn't poison anthropics live models. Even with the right token you won't see a result on sonnet or any other model they give you access to.
This result comes from models trained just for the research. They didn't poison anthropics live models. Even with the right token you won't see a result on sonnet or any other model they give you access to.