Skip to content

Node Configuration

Each compute node is configured to have 8 NUMA nodes and each NUMA node has 24 physical cores. As hyperthreading is enabled there are 48 logical cores per NUMA node and consequently each server possesses 384 virtual cores in total.

$ lscpu | grep NUMA
NUMA node(s):                            8
NUMA node0 CPU(s):                       0-23,192-215
NUMA node1 CPU(s):                       24-47,216-239
NUMA node2 CPU(s):                       48-71,240-263
NUMA node3 CPU(s):                       72-95,264-287
NUMA node4 CPU(s):                       96-119,288-311
NUMA node5 CPU(s):                       120-143,312-335
NUMA node6 CPU(s):                       144-167,336-359
NUMA node7 CPU(s):                       168-191,360-383

GPU Nodes:

Each GPU node has four NVIDIA H100 GPUs, each of which is attached to a particular NUMA node. The exact attachemnt can be seen from this command:

$ nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV6     NV6     NV6     SYS     SYS     SYS     PXB     SYS     SYS     72-95,264-287   3               N/A
GPU1    NV6      X      NV6     NV6     SYS     SYS     PXB     SYS     SYS     SYS     48-71,240-263   2               N/A
GPU2    NV6     NV6      X      NV6     SYS     SYS     SYS     SYS     SYS     PXB     144-167,336-359 6               N/A
GPU3    NV6     NV6     NV6      X      SYS     SYS     SYS     SYS     PXB     SYS     96-119,288-311  4               N/A
NIC0    SYS     SYS     SYS     SYS      X      PIX     SYS     SYS     SYS     SYS
NIC1    SYS     SYS     SYS     SYS     PIX      X      SYS     SYS     SYS     SYS
NIC2    SYS     PXB     SYS     SYS     SYS     SYS      X      SYS     SYS     SYS
NIC3    PXB     SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     SYS
NIC4    SYS     SYS     SYS     PXB     SYS     SYS     SYS     SYS      X      SYS
NIC5    SYS     SYS     PXB     SYS     SYS     SYS     SYS     SYS     SYS      X

From the above matrix one can see that the GPUs are attached to NUMA nodes 2,3,4, and 6, and for optimal perfomance processes should be pinned to these NUMA domains.

WEKA Filesystem

To achieved the best possible performance for WEKA filesystem it is required to reserve a certain number of cores for exclusive access.

GPU Nodes

4 virtual cores on each NUMA node -- in total 32 virtual cores -- are reserved for WEKA. Therefore 352 virtual (176 physical) cores are available for SLURM jobs.

CPU Nodes

4 virtual cores on NUMA node 7 are reseverd for WEKA. Hence 380 virtual (190 physial) cores are available for SLURM jobs.

Overview Table

node type physical/virtual cores total physical/virtual cores/NUMA
CPU 190/380 NUMA 0-6: 24/48
NUMA 7: 22/44
GPU 176/352 NUMA 0-7: 22/44

Tip

The SLURM configuration of a specific node is best seen with this scontrol command:

# scontrol show node n3014-001
NodeName=n3014-001 CoresPerSocket=24
   CPUAlloc=0 CPUEfctv=380 CPUTot=384 CPULoad=0.00
   AvailableFeatures=nogpu,hca_mlx5_0
   ActiveFeatures=nogpu,hca_mlx5_0
   Gres=np_zen4_0768:380
   NodeAddr=n3014-001 NodeHostName=n3014-001
   RealMemory=770000 AllocMem=0 FreeMem=N/A Sockets=8 Boards=1
   CoreSpecCount=2 CPUSpecList=380-383 MemSpecLimit=7000
   State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   BootTime=None SlurmdStartTime=None
   LastBusyTime=2025-10-21T11:46:18 ResumeAfterTime=None
   CfgTRES=cpu=380,mem=770000M,billing=380,gres/np_zen4_0768=100