Sign in to confirm you’re not a bot
This helps protect our community. Learn more
Deploy OpenAI Services at Scale Using Provision Throughput Units
35Likes
1,542Views
Nov 52024
In this episode of the Azure Essentials Show, Thomas and David discuss how businesses can implement and scale generative AI using Azure OpenAI Service. They explore different deployment options, focusing on standard and provisioned deployments, and provide demos on optimizing these deployments with Azure best practices. David explains the concept of Provisioned Throughput Units (PTUs) and offers practical tips for estimating PTU needs, checking quota, and purchasing reservations to ensure reliable performance and cost efficiency. Resources • Understanding Azure OpenAI Service deployment types https://learn.microsoft.com/azure/ai-... • Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding https://learn.microsoft.com/azure/ai-... • Optimize spend and performance with Azure OpenAI Service provisioned reservations https://aka.ms/azure-pricing-AOAI-sta... • Save costs with Microsoft Azure OpenAI Service Provisioned Reservations https://learn.microsoft.com/azure/cos... • Save with Azure reservations https://learn.microsoft.com/azure/cos... • Explore essential resources! https://www.azure.com/solutions/azure... Related episodes • Watch additional pricing videos https://aka.ms/AzurePricingVideos • Watch the Azure Essentials Show https://aka.ms/AzureEssentialsShow Connect • Thomas Maurer   / thomasmaurer2   • David Huntley   / davidhuntley   Chapters 0:00 Introduction 1:10 Pay-as-you-go 1:25 Provisioned deployments 1:45 PTUs explained 2:19 Demo: capacity calculator 3:35 Demo: Checking quotas 4:21 Demo: Create provision deployment 5:47 Hourly vs. reservations 6:30 Capacities are not guaranteed 7:17 Demo: Purchasing reservations 9:55 Monitoring usage 10:27 Tip: Create deployments then reservations 10:59 Resources

Follow along using the transcript.

Microsoft Developer

588K subscribers