Setting up SLURM Job Accounting with Azure CycleCloud and Azure Database for MySQL Flexible Server
Published Mar 13 2024 02:32 AM 1,069 Views
Microsoft

SLURM (Simple Linux Utility for Resource Management) is a highly configurable open-source workload manager used in high-performance computing (HPC) environments. Job accounting is a crucial aspect of SLURM, allowing system administrators to track resource usage, monitor job performance, and allocate resources efficiently.

 

SLURM's job accounting feature records various metrics related to job execution, such as:

  1. Job IDs: Unique identifiers for each job submitted to the SLURM system.
  2. User Information: Details about the user who submitted the job, such as username and user ID.
  3. Resource Usage: Information on resources consumed during job execution, including CPU time, memory usage, and disk space.
  4. Job States: Tracking the state transitions of jobs (e.g., pending, running, completed, failed).
  5. Start and End Times: Timestamps indicating when a job started and finished execution.
  6. Node Allocation: Details about the nodes allocated to the job, including the number of nodes, node names, and partition information.

 

By enabling job accounting, system administrators gain insights into resource utilization patterns, identify potential bottlenecks, and optimize resource allocation for better efficiency and cost-effectiveness.

SLURM provides various tools and utilities for managing job accounting data, including commands for querying job records, generating reports, and integrating with external databases for long-term storage and analysis.

 

Overall, SLURM job accounting plays a crucial role in ensuring the effective management and optimization of computational resources in HPC environments.

This new blog serves as a continuation of my previous post, "Enabling Job Accounting for SLURM with Azure CycleCloud 8.2 and Azure MariaDB Database." Due to the retirement of Azure Database for MariaDB, scheduled for September 19, 2025, we are transitioning to the Azure Database for MySQL Flexible Server offering for configuring SlurmDBD for job accounting. In this blog, we'll explore the process of setting up SlurmDBD with Azure Database for MySQL Flexible Server to maintain efficient job accounting within SLURM.

 

Starting from Azure CycleCloud version 8.1.0, the Slurm template includes support for enabling SlurmDBD on Slurm versions 20.11 and above. This blog post operates under the assumption that you have access to Azure CycleCloud version 8.6 and Azure Database for MySQL Flexible Server to facilitate the setup of both the Slurm cluster and SlurmDBD configuration.

 

For the purpose of this demonstration, I've created a virtual network named "hpc" consisting of two subnets: "compute" and "mysql". The "compute" subnet is designated for the creation of CycleCloud VMs and the Slurm cluster. Meanwhile, the "mysql" subnet will be utilized for Azure Database for MySQL Flexible Server to facilitate the configuration of SlurmDBD.

vinilv_0-1710321917067.png

 

 

Create a Azure Database for MySQL Flexible Server instance from Azure Portal.

 

vinilv_1-1710321917071.png

Please furnish the details in accordance with your specifications. This includes providing information such as the database name, database username, password, region, MySQL version (selected as 8.0), workload type (Business Critical), authentication method (MySQL authentication only), and any other pertinent requirements.

 

In the Networking section, opt for Private access (VNet Integration) and choose the previously established "mysql" subnet. Proceed to create the Azure Database for MySQL Flexible Server. Upon successful deployment and initialization of the database, you will obtain the necessary details essential for configuring Slurm's job accounting setup.

 

vinilv_2-1710321917075.png

To configure Slurm job accounting, gather the following details from the Azure Database for MySQL Flexible server:

  1.  Server name: myslurmdb.mysql.database.azure.com
  2.  Server Admin username: dbauser
  3.  Server Admin Password: **********
  4.  SSL Certificate URL: https://dl.cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem

Reference: https://learn.microsoft.com/en-us/azure/mysql/flexible-server/how-to-connect-tls-ssl#download-the-pu...

 

vinilv_3-1710321917079.png

 

Slurm recommend to adjust the innodb_lock_wait_timeout to 900 to facilitate the successful completion of potentially prolonged queries. It's necessary to modify the innodb_lock_wait_timeout within the MySQL Server Parameters configuration.

vinilv_0-1710433131641.png

 

Now, let's incorporate these configurations into the advanced settings of the CycleCloud Slurm cluster. Begin by enabling Job Accounting, then proceed to add the following details:

vinilv_4-1710321917082.png

After adding the required details to set up the Slurm Cluster, save the configuration and start the cluster. Once the cluster is operational, execute a sample job and examine "sacct" to verify the functionality of job accounting.

 

[vinil@slurm1-scheduler ~]$ srun hostname
slurm1-hpc-1
[vinil@slurm1-scheduler ~]$ sacct
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1              hostname        hpc                     1  COMPLETED      0:0
1.0            hostname                                1  COMPLETED      0:0
[root@slurm1-scheduler march-2024]# sacct --format=jobid,elapsed,ncpus,ntasks,state
JobID           Elapsed      NCPUS   NTasks      State
------------ ---------- ---------- -------- ----------
1              00:00:05          1           COMPLETED
1.0            00:00:00          1        1  COMPLETED

 

 

vinilv_5-1710321917082.png

vinilv_6-1710321917083.png

You can also retrieve job statistics for a particular user or a specific cluster. Refer to the "sacct" documentation for additional examples and guidance.

 

To sum up, this blog provides a detailed walkthrough for configuring SLURM job accounting using Azure CycleCloud and Azure Database for MySQL Flexible Server. It equips administrators with the necessary tools to efficiently manage and enhance resource utilization in HPC environments. If you've found this blog helpful, please consider liking or commenting below to help me gauge its usefulness to you. Your feedback is invaluable in shaping future content.

 

Reference:

Quickstart: Use the Azure portal to create an Azure Database for MySQL - Flexible Server instance

Azure Database for MariaDB will be retired on 19 September 2025 – Migrate to Azure Database for MySQ...

Cyclecloud documentation

Slurm Job Accounting

Co-Authors
Version history
Last update:
‎Mar 14 2024 09:19 AM
Updated by: