A Stanford researcher built a Survivor-style game where AI models form alliances and vote rivals out. The benchmark aims to address growing problems with saturated and contaminated AI evaluations.