SLURM (Simple Linux Utility for Resource Management) is a highly configurable open-source workload manager used in high-performance computing (HPC) environments. Job accounting is a crucial aspect of SLURM, allowing system administrators to track resource usage, monitor job performance, and allocate resources efficiently.
SLURM's job accounting feature records various metrics related to job execution, such as:
By enabling job accounting, system administrators gain insights into resource utilization patterns, identify potential bottlenecks, and optimize resource allocation for better efficiency and cost-effectiveness.
SLURM provides various tools and utilities for managing job accounting data, including commands for querying job records, generating reports, and integrating with external databases for long-term storage and analysis.
Overall, SLURM job accounting plays a crucial role in ensuring the effective management and optimization of computational resources in HPC environments.
This new blog serves as a continuation of my previous post, "Enabling Job Accounting for SLURM with Azure CycleCloud 8.2 and Azure MariaDB Database." Due to the retirement of Azure Database for MariaDB, scheduled for September 19, 2025, we are transitioning to the Azure Database for MySQL Flexible Server offering for configuring SlurmDBD for job accounting. In this blog, we'll explore the process of setting up SlurmDBD with Azure Database for MySQL Flexible Server to maintain efficient job accounting within SLURM.
Starting from Azure CycleCloud version 8.1.0, the Slurm template includes support for enabling SlurmDBD on Slurm versions 20.11 and above. This blog post operates under the assumption that you have access to Azure CycleCloud version 8.6 and Azure Database for MySQL Flexible Server to facilitate the setup of both the Slurm cluster and SlurmDBD configuration.
For the purpose of this demonstration, I've created a virtual network named "hpc" consisting of two subnets: "compute" and "mysql". The "compute" subnet is designated for the creation of CycleCloud VMs and the Slurm cluster. Meanwhile, the "mysql" subnet will be utilized for Azure Database for MySQL Flexible Server to facilitate the configuration of SlurmDBD.
Create a Azure Database for MySQL Flexible Server instance from Azure Portal.
Please furnish the details in accordance with your specifications. This includes providing information such as the database name, database username, password, region, MySQL version (selected as 8.0), workload type (Business Critical), authentication method (MySQL authentication only), and any other pertinent requirements.
In the Networking section, opt for Private access (VNet Integration) and choose the previously established "mysql" subnet. Proceed to create the Azure Database for MySQL Flexible Server. Upon successful deployment and initialization of the database, you will obtain the necessary details essential for configuring Slurm's job accounting setup.
To configure Slurm job accounting, gather the following details from the Azure Database for MySQL Flexible server:
Slurm recommend to adjust the innodb_lock_wait_timeout to 900 to facilitate the successful completion of potentially prolonged queries. It's necessary to modify the innodb_lock_wait_timeout within the MySQL Server Parameters configuration.
Now, let's incorporate these configurations into the advanced settings of the CycleCloud Slurm cluster. Begin by enabling Job Accounting, then proceed to add the following details:
After adding the required details to set up the Slurm Cluster, save the configuration and start the cluster. Once the cluster is operational, execute a sample job and examine "sacct" to verify the functionality of job accounting.
[vinil@slurm1-scheduler ~]$ srun hostname
slurm1-hpc-1
[vinil@slurm1-scheduler ~]$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1 hostname hpc 1 COMPLETED 0:0
1.0 hostname 1 COMPLETED 0:0
[root@slurm1-scheduler march-2024]# sacct --format=jobid,elapsed,ncpus,ntasks,state
JobID Elapsed NCPUS NTasks State
------------ ---------- ---------- -------- ----------
1 00:00:05 1 COMPLETED
1.0 00:00:00 1 1 COMPLETED
You can also retrieve job statistics for a particular user or a specific cluster. Refer to the "sacct" documentation for additional examples and guidance.
To sum up, this blog provides a detailed walkthrough for configuring SLURM job accounting using Azure CycleCloud and Azure Database for MySQL Flexible Server. It equips administrators with the necessary tools to efficiently manage and enhance resource utilization in HPC environments. If you've found this blog helpful, please consider liking or commenting below to help me gauge its usefulness to you. Your feedback is invaluable in shaping future content.
Reference:
Quickstart: Use the Azure portal to create an Azure Database for MySQL - Flexible Server instance
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.