This helps protect our community. Learn more

Learn Live: Monitoring Azure OpenAI

114K subscribers

1.3K views Streamed 1 year ago

Full series information: https://aka.ms/learnlive-fta3 More info here: https://aka.ms/learnlive-fta3-Ep11 Follow on Microsoft Learn:

Session documentation: https://aka.ms/learnlive-20240424FT

…

...more

Learn Live: Monitoring Azure OpenAI

Full series information: https://aka.ms/learnlive-fta3 More info here: https://aka.ms/learnlive-fta3-Ep11 Follow on Microsoft Learn:

Session documentation: https://aka.ms/learnlive-20240424FT

In this session, we will explore the monitoring of Azure OpenAI. Starting with an overview of the big picture of the entire solution, we will then zoom in on Azure OpenAI services through the lens of the Well-Architected Framework (WAF). We will discuss key concepts such as Tokens, Rate Limiting, Quotas, and PTU, as well as Metrics & Alerts for Azure overall, with a focus on reliability and SRE. We will also delve into SLA and performance, including response times. Specifically, for OpenAI, we will cover concepts like Token Usage, Quota, and Response Times. As we focus on monitoring for resiliency, performance, and response times, we will discuss Metrics, Dashboards, and Alarms. Finally, a detailed dive into diagnostic settings and log analytics, including the use of Kusto. By the end of this session, you will have a comprehensive understanding of how to monitor Azure OpenAI, and be equipped with the knowledge and tools --------------------- Learning objectives

Gain a comprehensive overview of Azure OpenAI monitoring within the Well-Architected Framework.
Understand key operational metrics like Tokens, Rate Limiting, Quotas, and PTU relevant to Azure OpenAI.
Learn about setting up Metrics, Alerts, and SLAs for effective monitoring and reliability.
Master the use of Dashboards and Alarms for monitoring resiliency, performance, and response times in Azure OpenAI.
Delve into advanced diagnostic settings and log analytics with Kusto for in-depth monitoring insights.

--------------------- Chapters -------- 00:00 - Introduction 01:28 - Learning objectives 01:59 - Agenda 03:17 - OpenAI Terms 09:36 - Tokens 11:07 - OpenAI API Quotas 12:39 - Rate Limiting 13:49 - Azure API Management (APIM) 15:21 - APIM Policies 18:09 - APIM Backends 18:58 - APIM Load Balancer & Circuit Breaker 19:44 - Smart Load Balancing for OpenAI Endpoints and APIM 20:45 - Monitoring Azure OpenAI 30:41 - Demo 58:42 - Langfuse on Azure 1:00:04 - Telemetry in Semantic Kernel SDK 1:02:21 - Model monitoring for generative AI applications 1:03:04 - Monitoring published APIs using APIM 1:03:18 - Importing Azure OpenAI APIs into APIM 1:04:21 - Monitoring AI Search --------------------- Presenters Victor Santana Azure Customer Engineer Microsoft

LinkedIn: / victorwelascosantana

Chris Ayers Senior Customer Engineer Microsoft

LinkedIn: / chris-l-ayers
Twitter: / chris_l_ayers

Moderators Marc Mercier Senior Customer Engineer Microsoft

LinkedIn: / marc-mercier

Transcript

Follow along using the transcript.

Microsoft Reactor

114K subscribers

Live chat replay is not available for this video.

Learn Live: Monitoring Azure OpenAI

Chapters View all

Introduction

Introduction

Introduction

Learning objectives

Learning objectives

Learning objectives

Agenda

Agenda

Agenda

OpenAI Terms

OpenAI Terms

OpenAI Terms

Tokens

Tokens

Tokens

OpenAI API Quotas

OpenAI API Quotas

OpenAI API Quotas

Rate Limiting

Rate Limiting

Rate Limiting

Azure API Management (APIM)

Azure API Management (APIM)

Azure API Management (APIM)

Microsoft Reactor

Learn Live: Monitoring Azure OpenAI

Comments

Chapters

Introduction

Introduction

Introduction

Learning objectives

Learning objectives

Learning objectives

Agenda

Agenda

Agenda

OpenAI Terms

OpenAI Terms

OpenAI Terms

Tokens

Tokens

Tokens

OpenAI API Quotas

OpenAI API Quotas

OpenAI API Quotas

Rate Limiting

Rate Limiting

Rate Limiting

Azure API Management (APIM)

Azure API Management (APIM)

Azure API Management (APIM)

APIM Policies

APIM Policies

APIM Policies

APIM Backends

APIM Backends

APIM Backends

APIM Load Balancer & Circuit Breaker

APIM Load Balancer & Circuit Breaker

APIM Load Balancer & Circuit Breaker

Smart Load Balancing for OpenAI Endpoints and APIM

Smart Load Balancing for OpenAI Endpoints and APIM

Smart Load Balancing for OpenAI Endpoints and APIM

Monitoring Azure OpenAI

Monitoring Azure OpenAI

Monitoring Azure OpenAI

Demo

Demo

Demo

Langfuse on Azure

Langfuse on Azure

Langfuse on Azure

Telemetry in Semantic Kernel SDK

Telemetry in Semantic Kernel SDK

Telemetry in Semantic Kernel SDK

Model monitoring for generative AI applications

Model monitoring for generative AI applications

Chapters

Chapters