Optimize AI training workloads using Azure CycleCloud and Slurm

Beginner
Solution Architect
Data Scientist
Developer
Student
Azure CycleCloud

Azure AI training workloads are crucial for achieving optimal model performance that enables enhance of your AI training workflows.

Learning objectives

By the end of this module, you're able to:

  • Be able to deploy a Slurm cluster using Azure CycleCloud.
  • Learn how to ensure your cluster is healthy through Node Health Checks.
  • Learn how to optimize your cluster for AI training workloads.

Prerequisites

  • Basic knowledge of HPC.
  • Understand the concepts of Azure Virtual Machines, subscription, resource groups, and virtual networks.
  • Ability to deploy VM, mounting shared file systems.