Sign in to confirm you’re not a bot
This helps protect our community. Learn more
Learn Live: Load Balancing Azure OpenAI instances using APIM and Container
Full series information: https://aka.ms/learnlive-fta3 More info here: https://aka.ms/learnlive-fta3-Ep10 In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies. We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps --------------------- Learning objectives
  • Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.
--------------------- Chapters -------- 00:00 - Welcome and introductions 01:29 - Learning objectives 02:50 - Tokens 05:36 - Azure OpenAI Service quotas and limits 11:16 - Token Per Minute (TPM) 17:58 - Requests Per Minute (RPM) 20:43 - Dynamic Quota 24:35 - Best practices 27:30 - Challenges 30:24 - Load balancing multiple AOAI instances 33:03 - Review challenges 36:38 - Load balancing strategies 40:10 - Load balancing AOAI with Azure API Management 42:05 - Demo 1:22:47 - Summary and conclusion --------------------- Presenters Andre Dewes Senior Customer Engineer Microsoft Srini Padala Senior Data Engineer Microsoft Moderators Chris Ayers Senior Customer Engineer Microsoft

Follow along using the transcript.

Microsoft Reactor

114K subscribers
Live chat replay is not available for this video.