Deploying and Monitoring LLM Inference Endpoints

In this session we will dive into deploying LLMs to Production Inference Endpoints and then putting in place automated monitoring metrics and alerts to help track model performance and suppress potential output issues such as toxicity. We will also cover the process of optimizing LLMs using RAG, for relevant, accurate, and useful outputs. You will leave this session with a comprehensive understanding about deploying LLMs to production and monitoring the models for issues such as Toxicity, relevance, and accuracy. Try this Computer Vision model and other common AI use cases using the Wallaroo.AI Azure Inference Server Freemium Offer on Azure Marketplace - (https://aka.ms/Wallaroo-Inference) and also try the Free Wallaroo.AI Community Edition - (https://aka.ms/Wallaroo.AI-Free) [eventID:23187]

Deploying and Monitoring LLM Inference Endpoints

Microsoft Reactor

Deploying and Monitoring LLM Inference Endpoints

Comments

Description

Microsoft Reactor

Transcript

Building Custom LLMs for Production Inference Endpoints - Wallaroo.ai

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

DP-700: Get started with data engineering on Microsoft Fabric

Cybersecurity Architecture: Who Are You? Identity and Access Management

Managed Identities Deep Dive | Building RAG with Logic Apps

Lessons learned on Applied Advanced Agentic AI

Forest Cafe Jazz Music | Morning Tranquill Jazz With Nature Therapy For Stress Relief, Study & Wo...

Transformers (how LLMs work) explained visually | DL5

OMG: The ACTUAL reason for Trump’s Alcatraz scheme gets REVEALED | Another Day

Full interview: "Godfather of AI" shares prediction for future of AI, issues warnings

Soothing, relaxing music reduces stress and stops thinking too much

Microsoft 365 Copilot Frontier Program Event + AMA

Develop AI Agents using Azure AI Agent Service

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Large Language Models (LLMs) - Everything You NEED To Know

Semantic Kernel Office Hours for US/EMEA - April 30th, 2025

From zero to hero - Azure Cognitive Search

Frontiers of AI and Computing: A Conversation With Yann LeCun and Bill Dally | NVIDIA GTC 2025

Cybersecurity Architecture: Networks