Skip to content

Where to store data

VSC-4 and VSC-5 provide different facilities for data: on the hardware side, there is a high-performance IBM Storage Scale Parallel Filesystem (aka GPFS) and a node-local flash storage. The parallel filesystem is the same on both clusters, so projects that exist on both clusters can access the same data.
They are accessible under:

  • IBM Spectrum Scale:
    $HOME expands to /home/fsXXXXX/username
    $DATA expands to /gpfs/data/fsXXXXX/username
    where XXXXX is the project number
  • Local SSD
    /local

Home Storage

$HOME is the location of the user UNIX home directory. It is entirely located on NVMe discs.
Quota on $HOME is 100GB and 106 number of files for the entire project. The storage size can not be extended but the number of files can be increased upon request.

Project Storage

$DATA is a tiered file system containing 500TB flash and around 5PB of HDD storage. It stores up to 10% of the data and all metatdata on NVMe discs and the rest on spinning discs. Frequently used files are automatically moved to the NVMe tier, while unused files are moved back to the HDD tier.
Quota on $DATA is 10TB and 106 number of files for the project and can be extended up to 100TB. If there is need for even more storage, specific arrangements can be made.

Access permissions

The files on $DATA are usually group read-/writable so project members can exchange data.

Check quota

You can check your current quota usage on each of the two storage systems with:

mmlsquota --block-size auto -j data_fs7XXXX data [or home_fs7XXXX home]

For running but not for storing

/local

/local is a locally mounted NVMe disk. The size is about 450GB on VSC-4 and 1.8TB on VSC-5. Data retention is not guaranteed, all data is lost on a reboot, so they need to be transferred after use.

/tmp

Can be used for intensive I/O, and take up to half a compute node memory and the data resides in the shared memory (RAM) of the node. The data gets deleted after the job, so move results to $DATA

Warning

/local and /tmp might be useful to run jobs but they are NOT for permanent storage.

Requesting extra storage

Submit your request via the project website

What to store where

$HOME

User settings and various caches and config directories are automatically here. Additionally, HOME can be used for custom configuration, code and scripts, custom software and environments (such as conda). Do NOT store any scientific or research data here - this includes original input data as well as final results. They get too big very fast, and heavy I/O to HOME should also be avoided.

$DATA

This is the main project volume, so all scientific/research data must go there, including raw input data, final results and all types of intermediate data. Please remove any data (especially temporary or intermediate data) that is no longer in use - the system is not set up for long-term archiving.