LLM-as-a-judge is exactly what it sounds like: using one language model to evaluate the outputs of another. Your first instinct might be that this is circular reasoning. Using AI to grade AI feels ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results