A Stanford researcher built a Survivor-style game where AI models form alliances and vote rivals out. The benchmark aims to address growing problems with saturated and contaminated AI evaluations.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results