Sign in to confirm you’re not a bot
This helps protect our community. Learn more
Building a LLM Judge with Weights & Biases
Evaluating LLM outputs accurately is critical to being able to iterate quickly on a LLM system. Human annotations can be slow and expensive and using LLMs instead promises to solve this. However, aligning a LLM Judge with human judgements is often hard with many implementation details to consider. In this workshop we will explore:
  • Evaluating specialized LLMs using Weave
  • Productionizing the latest LLM-as-a-judge research
  • Improving on your existing judge
  • Building annotation UIs
#MicrosoftReactor [eventID:23760]

Follow along using the transcript.

Microsoft Reactor

106K subscribers
Live chat replay is not available for this video.