Perhaps "one model to rule them all" isnt the best approach.

sebzim4500 · on April 6, 2023

There's probably a huge amount of room for improvement in the RLHF process. If there is still low hanging fruit, it would have to be there.

brucethemoose2 · on April 6, 2023

"I dunno" would have to be marked as a good or neutral response in the RLHF process, and that seems like a problematic training incentive.

sebzim4500 · on April 7, 2023

In an ideal world "I don't know" would be considered worse than a correct answer but much better than a wrong answer.

In the UK, there is a competition called the "junior maths challenge", or something, which is a multiple choice quiz where correct answers are +1 and incorrect answers are -6 (so guessing has negative EV). I think we need a similar scoring system here.