If playback doesn't begin shortly, try restarting your device.
•
You're signed out
Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer.
CancelConfirm
Share
An error occurred while retrieving sharing information. Please try again later.
In this session, we will explore the monitoring of Azure OpenAI. Starting with an overview of the big picture of the entire solution, we will then zoom in on Azure OpenAI services through the lens of the Well-Architected Framework (WAF). We will discuss key concepts such as Tokens, Rate Limiting, Quotas, and PTU, as well as Metrics & Alerts for Azure overall, with a focus on reliability and SRE. We will also delve into SLA and performance, including response times.
Specifically, for OpenAI, we will cover …...more
In this session, we will explore the monitoring of Azure OpenAI. Starting with an overview of the big picture of the entire solution, we will then zoom in on Azure OpenAI services through the lens of the Well-Architected Framework (WAF). We will discuss key concepts such as Tokens, Rate Limiting, Quotas, and PTU, as well as Metrics & Alerts for Azure overall, with a focus on reliability and SRE. We will also delve into SLA and performance, including response times.
Specifically, for OpenAI, we will cover concepts like Token Usage, Quota, and Response Times. As we focus on monitoring for resiliency, performance, and response times, we will discuss Metrics, Dashboards, and Alarms. Finally, a detailed dive into diagnostic settings and log analytics, including the use of Kusto.
By the end of this session, you will have a comprehensive understanding of how to monitor Azure OpenAI, and be equipped with the knowledge and tools
---------------------
Learning objectives
Gain a comprehensive overview of Azure OpenAI monitoring within the Well-Architected Framework.
Understand key operational metrics like Tokens, Rate Limiting, Quotas, and PTU relevant to Azure OpenAI.
Learn about setting up Metrics, Alerts, and SLAs for effective monitoring and reliability.
Master the use of Dashboards and Alarms for monitoring resiliency, performance, and response times in Azure OpenAI.
Delve into advanced diagnostic settings and log analytics with Kusto for in-depth monitoring insights.
---------------------
Chapters
--------
00:00 - Introduction
01:28 - Learning objectives
01:59 - Agenda
03:17 - OpenAI Terms
09:36 - Tokens
11:07 - OpenAI API Quotas
12:39 - Rate Limiting
13:49 - Azure API Management (APIM)
15:21 - APIM Policies
18:09 - APIM Backends
18:58 - APIM Load Balancer & Circuit Breaker
19:44 - Smart Load Balancing for OpenAI Endpoints and APIM
20:45 - Monitoring Azure OpenAI
30:41 - Demo
58:42 - Langfuse on Azure
1:00:04 - Telemetry in Semantic Kernel SDK
1:02:21 - Model monitoring for generative AI applications
1:03:04 - Monitoring published APIs using APIM
1:03:18 - Importing Azure OpenAI APIs into APIM
1:04:21 - Monitoring AI Search
---------------------
Presenters
Victor Santana
Azure Customer Engineer
Microsoft