VSC-5 partitions and QoS¶
On VSC-5, nodes of the same type of hardware are grouped in to partitions. The quality of service (QoS), formerly called Queues defines the maximum run time of a job and the number and type of allocate-able nodes.
Partitions¶
- Intel CPU nodes: There is only one variant with Cascadelake CPUs and 368GB RAM.
- AMD CPU nodes: They all have Zen3 CPU nodes, but come in three memory versions - 512GB, 1TB and 2TB RAM.
- GPU nodes: there are two versions, one with Zen2 CPUs, 256GB RAM and 2x NVidia A40 GPUs, and one with Zen3 CPUs, 512GB RAM and 2x NVidia A100 GPUs.
These are the partitions on VSC-5:
Partition | Nodes | Architecture | CPU | Cores per CPU (physical/with HT) | GPU | RAM | Use |
---|---|---|---|---|---|---|---|
zen3_0512* | 564 | AMD | 2x AMD 7713 | 64/128 | No | 512 GB | The default partition |
zen3_1024 | 120 | AMD | 2x AMD 7713 | 64/128 | No | 1 TB | High Memory partition |
zen3_2048 | 20 | AMD | 2x AMD 7713 | 64/128 | No | 2 TB | Higher Memory partition |
cascadelake_0384 | 48 | Intel | 2x Intel Cascadelake | 48/96 | No | 384 GB | Directly use programs compiled for VSC-4 |
zen2_0256_a40x2 | 45 | AMD | 2x AMD 7252 | 8/16 | 2x NVIDIA A40 | 256 GB | Best for single precision GPU code |
zen3_0512_a100x2 | 60 | AMD | 2x AMD 7713 | 64/128 | 2x NVIDIA A100 | 512 GB | Best for double precision GPU code |
Type sinfo -o %P
on any node to see all the available partitions.
For the sake of completeness there are internally used special partitions, that can not be selected manually:
Partition | Description |
---|---|
login5 | login nodes, not an actual slurm partition |
rackws5 | GUI login nodes, not an actual slurm partition |
_jupyter | reserved for the jupyterhub |
Quality of service (QoS)¶
The QoS defines the maximum run time of a job and the number and type of allocate-able nodes.
The following QoS are available for all normal (=non private) projects:
QoS name | Gives access to Partition | Hard run time limits | Description |
---|---|---|---|
zen3_0512 | zen3_0512 | 72h (3 days) | Default |
zen3_1024 | zen3_1024 | 72h (3 days) | High Memory |
zen3_2048 | zen3_2048 | 72h (3 days) | Higher Memory |
cascadelake_0384 | cascadelake_0384 | 72h (3 days) | Intel nodes |
zen2_0256_a40x2 | zen2_0256_a40x2 | 72h (3 days) | GPU Nodes |
zen3_0512_a100x2 | zen3_0512_a100x2 | 72h (3 days) | GPU Nodes |
Tip
The QoSs that are assigned to a specific user can be viewed with:
sacctmgr show user `id -u` withassoc format=user,defaultaccount,account,qos%40s,defaultqos%20s
Idle QoS¶
If a project runs out of compute time, jobs of this project are now running with low job priority and reduced maximum run time limit in the idle QoS.
There is no idle QoS on cascadelake
or
zen2_0256_a40x2
/zen3_0512_a100x2
GPU Nodes.
QoS name | Gives access to Partition | Hard run time limits | Description |
---|---|---|---|
idle_0512 | zen3_0512 | 24h (1 day) | Projects out of compute time |
idle_1024 | zen3_1024 | 24h (1 day) | Projects out of compute time |
idle_2048 | zen3_2048 | 24h (1 day) | Projects out of compute time |
Devel QoS¶
The devel QoS gives fast feedback to the user when their job is
running. We recommend this before sending the job to one of the
compute
queues.
QoS name | Gives access to Partition | Hard run time limits |
---|---|---|
zen3_0512_devel | 5 nodes on zen3_0512 | 10min |
zen3_0512_a100x2_devel | 2 nodes on zen3_0512_a100x2 | 10min |
Private projects¶
Private projects come with different QoS
QoS name | Gives access to Partition | Hard run time limits |
---|---|---|
p7XXXX_YYYY | various - check with your project manager | up to 240h (10 days) |
where 7XXXX is the project number, and YYYY the RAM size.