Slurm Workload Manager¶
Setup on Ubuntu 24.04¶
Ubuntu 24.04 provides up-to-date packages for Slurm so there's no need to compile from source.
Control node:
apt install --no-install-recommends slurmctld slurmdbd slurm-client mariadb-server
Since Slurm depends on munge, it'll be pulled in automatically as a dependency.
The installation procedure will also generate /etc/munge/munge.key with correct permissions.
You can copy this file to other nodes in the cluster, or overwrite it with your own.
The rest of the configuration details are on Huimin Li's website.
Setup enroot¶
Download the Debian packages (.deb) from the latest release and install them:
enroot+caps_${VERSION}_amd64.debenroot_${VERSION}_amd64.deb
apt install ./enroot*.deb
If you need to configure enroot, consider dropping your settings in /etc/enroot/enroot.conf.d/*.conf instead of modifying /etc/enroot/enroot.conf directly.
Setup Pyxis¶
Pyxis needs to be compiled against the same version of Slurm as the cluster, so there's no pre-built packages.
Compile Pyxis¶
Note
The package only needs to be compiled once. The resulting Debian package can be copied to other nodes for installation.
Start by installing dependencies:
apt install --no-install-recommends build-essential libslurm-dev devscripts debhelper
Then grab a copy of the source code of Pyxis:
wget https://github.com/NVIDIA/pyxis/archive/refs/heads/master.tar.gz
tar zxvf master.tar.gz
Instructions to build .deb packages are already in their README so we can just follow them:
cd pyxis-master
make orig
make deb
cd ..
Install Pyxis¶
The compilation steps will produce a .deb file in the parent directory, i.e. alongside the pyxis-master directory.
apt install ./nvslurm-plugin-pyxis_*_amd64.deb
cat /usr/share/pyxis/pyxis.conf >> /etc/slurm/plugstack.conf
systemctl restart slurmd
Install on Ubuntu 22.04¶
You have to install from source because slurm-wlm packaged in Ubuntu Jammy is version 21.08.5 that does not contain support for Cgroup v2, which Ubuntu Jammy uses by default.
After installing, the error log may indicate cannot find" cgroup_v2.so", while source code is available at src/plugins/cgroup/v2/cgroup_v2.c.
Trying to make that directory shows #include <dbus/dbus.h> cannot be found.
Installing libdbus-1-dev and re-configure and make should fix this.