If playback doesn't begin shortly, try restarting your device.
•
You're signed out
Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer.
CancelConfirm
Share
An error occurred while retrieving sharing information. Please try again later.
Full series information: https://aka.ms/learnlive-fta3
More info here: https://aka.ms/learnlive-fta3-Ep10
In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.
We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps
---------------------
Learning objectives
Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.
---------------------
Chapters
--------
00:00 - Welcome and introductions
01:29 - Learning objectives
02:50 - Tokens
05:36 - Azure OpenAI Service quotas and limits
11:16 - Token Per Minute (TPM)
17:58 - Requests Per Minute (RPM)
20:43 - Dynamic Quot…...more
Learn Live: Load Balancing Azure OpenAI instances using APIM and Container
Full series information: https://aka.ms/learnlive-fta3
More info here: https://aka.ms/learnlive-fta3-Ep10
In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.
We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps
---------------------
Learning objectives
Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.
---------------------
Chapters
--------
00:00 - Welcome and introductions
01:29 - Learning objectives
02:50 - Tokens
05:36 - Azure OpenAI Service quotas and limits
11:16 - Token Per Minute (TPM)
17:58 - Requests Per Minute (RPM)
20:43 - Dynamic Quota
24:35 - Best practices
27:30 - Challenges
30:24 - Load balancing multiple AOAI instances
33:03 - Review challenges
36:38 - Load balancing strategies
40:10 - Load balancing AOAI with Azure API Management
42:05 - Demo
1:22:47 - Summary and conclusion
---------------------
Presenters
Andre Dewes
Senior Customer Engineer
Microsoft