A quick start guide to benchmarking AI models in Azure: Llama 2 from MLPerf Inference v4.0
Published Mar 27 2024 09:00 AM 968 Views
Microsoft

By: Mark Gitau, Software Engineer, and Hugo Affaticati, Technical Program Manager 2 

 

Useful resources: 

New NC H100 v5-series: Microsoft NC H100 v5-series 

Thought leadership article: Aka.ms/Blog/MLPerfInfv4 

Azure results for MLPerf Inference: MLPerf Inference V4.0  

Submission to GitHub: mlcommons/inference_results_v4.0 

 

Microsoft Azure has delivered industry-leading results for AI inference workloads amongst cloud service providers in the most recent MLPerf Inference results published publicly by MLCommons. The Azure results were achieved using the new NC H100 v5 Virtual Machines (VMs) and reinforced the commitment from Azure to designing AI infrastructure that is optimized for training and inferencing in the cloud. In this document, one will find the steps to reproduce the results with the model Llama 2 from MLPerf Inference v4.0 on the new NC H100 v5 virtual machines.  

 

Pre-requisites: 

Step 1: Deploy and set up a virtual machine on Azure. 

Step 2: Mount the NVMe disks

cd /mnt
sudo vi nvme.sh

Copy and paste the following mounting script:

#!/bin/bash 

NVME_DISKS_NAME=`ls /dev/nvme*n1`
NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`

echo "Number of NVMe Disks: $NVME_DISKS"

if [ "$NVME_DISKS" == "0" ]
then
    exit 0
else
    mkdir -p /mnt/resource_nvme
    # Needed incase something did not unmount as expected. This will delete any data that may be left behind
    mdadm  --stop /dev/md*
    mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME
    mkfs.xfs -f /dev/md128
    mount /dev/md128 /mnt/resource_nvme
fi

chmod 1777 /mnt/resource_nvme

Run the script to mount the disk

sudo sh nvme.sh

Step 3: Set up docker

Update the Docker root directory in the docker daemon configuration file

sudo vi /etc/docker/daemon.json

Paste the following lines:

{
        "data-root":"/mnt/resource_nvme/data", 
        "runtimes":{
                "nvidia":{ 
                        "path":"nvidia-container-runtime",
                        "runtimeArgs":[] 
                }
        } 
}

Verify the previous steps and enable docker

docker --version
sudo systemctl restart docker
sudo systemctl enable docker

Register your user for Docker

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

You should not have any permission issues when running

docker info

 

Set up the environment:

Once your machine is deployed and configured, create a folder for the scripts and get the scripts from MLPerf Inference v4.0 repository. 

cd /mnt/resource_nvme
git clone https://github.com/mlcommons/inference_results_v4.0.git
cd inference_results_v4.0/closed/Azure

Create folders for the data and model:

export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch
mkdir -p $MLPERF_SCRATCH_PATH
mkdir $MLPERF_SCRATCH_PATH/data $MLPERF_SCRATCH_PATH/models $MLPERF_SCRATCH_PATH/preprocessed_data

To download the model and the preprocessed dataset, please follow the steps in code/llama2-70b/tensorrt/README.md (a license is required). 

Prebuild the container on the instance. 

make prebuild

The system name is saved under code/common/systems/custom_list.py and the configuration files are located in configs/[benchmark]/[scenario]/custom.py.  

You can finally build the container: 

make build

 

Run the benchmark 

Finally, run the benchmark with the make run command below. The performance result should match Azure’s official results published for MLPerf Inference v4.0. 

make run RUN_ARGS="--benchmarks=llama2-70b --scenarios=offline,server --config_ver=high_accuracy" 

 

Co-Authors
Version history
Last update:
‎Mar 26 2024 01:56 PM
Updated by: