This helps protect our community. Learn more

Learn Live: Load Balancing Azure OpenAI instances using APIM and Container

114K subscribers

3K views Streamed 1 year ago

Full series information: https://aka.ms/learnlive-fta3 More info here: https://aka.ms/learnlive-fta3-Ep10 In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies. We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps --------------------- Learning objectives

Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.

…

...more

Learn Live: Load Balancing Azure OpenAI instances using APIM and Container

Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.

--------------------- Chapters -------- 00:00 - Welcome and introductions 01:29 - Learning objectives 02:50 - Tokens 05:36 - Azure OpenAI Service quotas and limits 11:16 - Token Per Minute (TPM) 17:58 - Requests Per Minute (RPM) 20:43 - Dynamic Quota 24:35 - Best practices 27:30 - Challenges 30:24 - Load balancing multiple AOAI instances 33:03 - Review challenges 36:38 - Load balancing strategies 40:10 - Load balancing AOAI with Azure API Management 42:05 - Demo 1:22:47 - Summary and conclusion --------------------- Presenters Andre Dewes Senior Customer Engineer Microsoft

LinkedIn: / andre-dewes-480b5b62

Srini Padala Senior Data Engineer Microsoft

LinkedIn: / srinivasa-padala

Moderators Chris Ayers Senior Customer Engineer Microsoft

LinkedIn: / chris-l-ayers
Twitter: / chris_l_ayers

Transcript

Follow along using the transcript.

Microsoft Reactor

114K subscribers

Live chat replay is not available for this video.

Learn Live: Load Balancing Azure OpenAI instances using APIM and Container

Chapters View all

Welcome and introductions

Welcome and introductions

Welcome and introductions

Learning objectives

Learning objectives

Learning objectives

Tokens

Tokens

Tokens

Azure OpenAI Service quotas and limits

Azure OpenAI Service quotas and limits

Azure OpenAI Service quotas and limits

Token Per Minute (TPM)

Token Per Minute (TPM)

Token Per Minute (TPM)

Requests Per Minute (RPM)

Requests Per Minute (RPM)

Requests Per Minute (RPM)

Dynamic Quota

Dynamic Quota

Dynamic Quota

Best practices

Best practices

Best practices

Microsoft Reactor

Learn Live: Load Balancing Azure OpenAI instances using APIM and Container

Comments 5

Chapters

Welcome and introductions

Welcome and introductions

Welcome and introductions

Learning objectives

Learning objectives

Learning objectives

Tokens

Tokens

Tokens

Azure OpenAI Service quotas and limits

Azure OpenAI Service quotas and limits

Azure OpenAI Service quotas and limits

Token Per Minute (TPM)

Token Per Minute (TPM)

Token Per Minute (TPM)

Requests Per Minute (RPM)

Requests Per Minute (RPM)

Requests Per Minute (RPM)

Dynamic Quota

Dynamic Quota

Dynamic Quota

Best practices

Best practices

Best practices

Challenges

Challenges

Challenges

Load balancing multiple AOAI instances

Load balancing multiple AOAI instances

Load balancing multiple AOAI instances

Review challenges

Review challenges

Review challenges

Load balancing strategies

Load balancing strategies

Load balancing strategies

Load balancing AOAI with Azure API Management

Load balancing AOAI with Azure API Management

Load balancing AOAI with Azure API Management

Demo

Demo

Demo

Summary and conclusion

Summary and conclusion

Summary and conclusion

Description

Chapters View all

Microsoft Reactor

Transcript

AWS Certified Cloud Practitioner Certification Course (CLF-C02) - Pass the Exam!

Chapters

Chapters