LLM-as-a-judge is exactly what it sounds like: using one language model to evaluate the outputs of another. Your first instinct might be that this is circular reasoning. Using AI to grade AI feels ...