Sign in to confirm you’re not a bot
This helps protect our community. Learn more

Introduction

0:00

Learning objectives

1:28

Agenda

1:59

OpenAI Terms

3:17

Tokens

9:36

OpenAI API Quotas

11:07

Rate Limiting

12:39

Azure API Management (APIM)

13:49

APIM Policies

15:21

APIM Backends

18:09

APIM Load Balancer & Circuit Breaker

18:58

Smart Load Balancing for OpenAI Endpoints and APIM

19:44

Monitoring Azure OpenAI

20:45

Demo

30:41

Langfuse on Azure

58:42

Telemetry in Semantic Kernel SDK

1:00:04

Model monitoring for generative AI applications

1:02:21

Monitoring published APIs using APIM

1:03:04

Importing Azure OpenAI APIs into APIM

1:03:18

Monitoring AI Search

1:04:21
Learn Live: Monitoring Azure OpenAI
Full series information: https://aka.ms/learnlive-fta3 More info here: https://aka.ms/learnlive-fta3-Ep11 Follow on Microsoft Learn: In this session, we will explore the monitoring of Azure OpenAI. Starting with an overview of the big picture of the entire solution, we will then zoom in on Azure OpenAI services through the lens of the Well-Architected Framework (WAF). We will discuss key concepts such as Tokens, Rate Limiting, Quotas, and PTU, as well as Metrics & Alerts for Azure overall, with a focus on reliability and SRE. We will also delve into SLA and performance, including response times. Specifically, for OpenAI, we will cover concepts like Token Usage, Quota, and Response Times. As we focus on monitoring for resiliency, performance, and response times, we will discuss Metrics, Dashboards, and Alarms. Finally, a detailed dive into diagnostic settings and log analytics, including the use of Kusto. By the end of this session, you will have a comprehensive understanding of how to monitor Azure OpenAI, and be equipped with the knowledge and tools --------------------- Learning objectives
  • Gain a comprehensive overview of Azure OpenAI monitoring within the Well-Architected Framework.
  • Understand key operational metrics like Tokens, Rate Limiting, Quotas, and PTU relevant to Azure OpenAI.
  • Learn about setting up Metrics, Alerts, and SLAs for effective monitoring and reliability.
  • Master the use of Dashboards and Alarms for monitoring resiliency, performance, and response times in Azure OpenAI.
  • Delve into advanced diagnostic settings and log analytics with Kusto for in-depth monitoring insights.
--------------------- Chapters -------- 00:00 - Introduction 01:28 - Learning objectives 01:59 - Agenda 03:17 - OpenAI Terms 09:36 - Tokens 11:07 - OpenAI API Quotas 12:39 - Rate Limiting 13:49 - Azure API Management (APIM) 15:21 - APIM Policies 18:09 - APIM Backends 18:58 - APIM Load Balancer & Circuit Breaker 19:44 - Smart Load Balancing for OpenAI Endpoints and APIM 20:45 - Monitoring Azure OpenAI 30:41 - Demo 58:42 - Langfuse on Azure 1:00:04 - Telemetry in Semantic Kernel SDK 1:02:21 - Model monitoring for generative AI applications 1:03:04 - Monitoring published APIs using APIM 1:03:18 - Importing Azure OpenAI APIs into APIM 1:04:21 - Monitoring AI Search --------------------- Presenters Victor Santana Azure Customer Engineer Microsoft Chris Ayers Senior Customer Engineer Microsoft Moderators Marc Mercier Senior Customer Engineer Microsoft

Follow along using the transcript.

Microsoft Reactor

114K subscribers
Live chat replay is not available for this video.