GitLab Runner Installation¶
This page is a tutorial to install GitLab Runner on a node of the Monolithe.
Docker Installation¶
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
# Install the latest version
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Tested on Ubuntu 24.04 LTS.
# uninstall system packages related to docker
sudo dnf remove docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-selinux docker-engine-selinux docker-engine
# setup the repository
sudo dnf -y install dnf-plugins-core
sudo dnf-3 config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
# install docker
sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# start docker
sudo systemctl start docker
Tested on Fedora 39 and 40.
Try if it works:
GitLab Runner Installation¶
# Add the official GitLab repository:
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash
# Install the latest version of GitLab Runner, or skip to the next step to install a specific version
sudo apt install gitlab-runner
Warning
For now this is not working on Ubuntu 24.10 and it fails with the following error:
Download the right package:
# Replace ${arch} with any of the supported architectures, e.g. amd64, arm, arm64
# A full list of architectures can be found here https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/index.html
curl -LJO "https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/deb/gitlab-runner_${arch}.deb"
Install it:
# Add the official GitLab repository:
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.rpm.sh" | sudo bash
# Install the latest version of GitLab Runner, or skip to the next step to install a specific version
sudo dnf install gitlab-runner
Warning
For now this is not working on Asahi Linux (Fedora 39 and 40) and it fails with the following error:
Download the right package:
# Replace ${arch} with any of the supported architectures, e.g. amd64, arm, arm64
# A full list of architectures can be found here https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/index.html
curl -LJO "https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/rpm/gitlab-runner_${arch}.rpm"
Install it:
GitLab Runner Configuration¶
First you need to create a new runner from
gitlab.lip6.fr. To do this you need to go in a project
and then CI/CD Settings
-> click on the New project runner
blue button.
Here are the tags related to the architectures:
Tags that are not architecture dependent:
In Runner description
put linux-alsoc-hostname
where hostname
is the name
of the machine.
Once you picked the right tags and set a description you can continue the
registration procedure by clicking on the Create runner
blue button.
Then you will have to paste a command that look like this on the node :
Danger
sudo is very important in the previous command, in other case the runner
will be attached to the current $USER
.
Enter the GitLab instance URL:
Enter a name for the runner (if it is a Linux machine from the ALSOC team,
replace hostname
with the hostname of the machine where is running the
runner):
Enter an executor:
Enter the default Docker image:
To check if it works:
You are good to go!
- Sources:
Prevent GitLab Runner CI and SLURM jobs from running at the same time on a node¶
Info
Note that, in this section, the proposed solution is more a work around than a perfect solution. It supposes that GitLab Runner has been installed manually on each node (which is not the common way to use compute nodes with SLURM) and that you have root privileges on each compute node. However, it is relevant in the Monolithe cluster where each node has a different hardware and software configuration.
A gitlab-nfs
user has been created to submit SLURM jobs before to start the
CI. It is a standard account but its password has been disabled (with
sudo passwd -l gitlab-nfs
on the front node). Public and private keys have
been generated in the /nfs/users/gitlab-nfs/.ssh/
folder (id_rsa
and
id_rsa.pub
files).
First, these keys needs to be copied on the node where GitLab runner is installed:
(node): sudo mkdir /opt/gitlab-runner
(node): sudo mkdir /opt/gitlab-runner/ssh_keys
(node): sudo chmod 700 /opt/gitlab-runner/ssh_keys
(front): sudo chmod o+r /nfs/users/gitlab-nfs/.ssh/id_rsa
(node): sudo cp /nfs/users/gitlab-nfs/.ssh/id_rsa.pub /opt/gitlab-runner/ssh_keys
(node): sudo cp /nfs/users/gitlab-nfs/.ssh/id_rsa /opt/gitlab-runner/ssh_keys
(node): sudo chmod o-r /opt/gitlab-runner/ssh_keys/id_rsa
(front): sudo chmod o-r /nfs/users/gitlab-nfs/.ssh/id_rsa
Then, edit the config.toml
file:
Keys and scripts need to be mounted as volumes for runner docker image
instances. In the [runners.docker]
section, edit or add a volumes
entry as
follow:
volumes = ["/opt/gitlab-runner/ssh_keys:/opt/ssh_keys:ro", "/nfs/scripts/gitlab-runner:/opt/scripts:ro", "/cache"]
And, after the executor = "docker"
line in the [[runners]]
section, add the
following lines:
pre_build_script = '''
bash /opt/scripts/pre_build_script.sh
'''
post_build_script = '''
bash /opt/scripts/post_build_script.sh
'''
environment = ["SLURM_PARTITION=<the_slurm_partition_here>"]
Replace <the_slurm_partition_here>
by the real SLURM partition of the current
node.
VoilĂ , this is done, you're good to go!
This is not a perfect solution. GitLab runner jobs will start even if a SLURM job is currently running on the node. However, in this case, the GitLab runner job will loop (passive waiting) until the SLURM job ends.
Warning
pre_build_script.sh
and post_build_script.sh
require ssh
to be
installed on the docker image to work. If the image is Debian-like, then
ssh
is automatically installed through apt
.
Danger
When a GitLab job is cancelled from the GitLab web interface, the
post_build_script
is NOT called. Then, the SLURM job corresponding to the
GitLab job will stay active during CI_JOB_TIMEOUT
seconds (generally one
hour). Meanwhile, the node will be unavailable for regular SLURM jobs.
This is not the case for new GitLab runner jobs because the
pre_build_script.sh
script cancels all its previous SLURM job (only on the
corresponding partition) before to submit a new one.
Source Codes¶
Here is the contents of pre_build_script.sh
and post_build_script.sh
scripts located in /nfs/scripts/gitlab-runner
:
#!/bin/bash
set -x
# install ssh client if not found and if OS is Debian-like
if ! [ -x "$(command -v ssh)" ]; then
echo 'Warning: ssh client not found.' >&2
if [ -x "$(command -v apt)" ]; then
apt update
apt install -y openssh-client
fi
fi
# just print environment variables and set `SLURM_JOB_TIMEOUT_MIN` variable
echo "CI_JOB_TIMEOUT=${CI_JOB_TIMEOUT}" # used to determine the maximum time (in seconds) of the SLURM job
echo "CI_JOB_ID=${CI_JOB_ID}" # used to set the SLURM job name
SLURM_JOB_TIMEOUT_MIN=$((CI_JOB_TIMEOUT/60))
echo "SLURM_JOB_TIMEOUT_MIN=${SLURM_JOB_TIMEOUT_MIN}" # used to determine the maximum time (in minutes) of the SLURM job
# for the following lines, commands are executed on the front node (through ssh connection) and here are the steps
# 1. cancel other SLURM jobs from the same partition and user
# 2. submit the SLURM job (non-blocking)
# 3. if the SLURM job is not RUNNNING just after, print a message
# 4. loop while the previously submitted SLURM job is not RUNNNING (passive waiting)
ssh -o "IdentitiesOnly=yes" -o "StrictHostKeyChecking=accept-new" -i /opt/ssh_keys/id_rsa gitlab-nfs@front.mono.proj.lip6.fr /bin/bash << EOF
scancel -p ${SLURM_PARTITION} -u gitlab-nfs
sbatch -p ${SLURM_PARTITION} -J ${CI_JOB_ID}_job --exclusive --nodes=1 --time=${SLURM_JOB_TIMEOUT_MIN} --wrap="sleep ${CI_JOB_TIMEOUT}"
sleep 3
state=\$(squeue -n ${CI_JOB_ID}_job -p ${SLURM_PARTITION} -l | tail -n 1 | awk '{print \$5}')
if [[ "\$state" != "RUNNING" ]]; then echo "Waiting for SLURM job(s) on the same node to be complete..."; fi
while [[ "\$state" != "RUNNING" ]]; do sleep 30; state=\$(squeue -n ${CI_JOB_ID}_job -p ${SLURM_PARTITION} -l | tail -n 1 | awk '{print \$5}'); done
EOF
#!/bin/bash
set -x
# install ssh client if not found and if OS is Debian-like
if ! [ -x "$(command -v ssh)" ]; then
echo 'Warning: ssh client not found.' >&2
if [ -x "$(command -v apt)" ]; then
apt update
apt install -y openssh-client
fi
fi
# connect to the Monolithe frontend to cancel the CI job in SLURM
ssh -o "IdentitiesOnly=yes" -o "StrictHostKeyChecking=accept-new" -i /opt/ssh_keys/id_rsa gitlab-nfs@front.mono.proj.lip6.fr /bin/bash << EOF
scancel -p ${SLURM_PARTITION} -n ${CI_JOB_ID}_job
EOF
List of Installed Nodes¶
- brubeck.soc.lip6.fr
- front.mono.proj.lip6.fr
- xu4.mono.proj.lip6.fr
- rpi4.mono.proj.lip6.fr
- m1u.mono.proj.lip6.fr (manual install)
- opi5.mono.proj.lip6.fr
- em780.mono.proj.lip6.fr
- x7ti.mono.proj.lip6.fr (manual install)