Where to store data¶
MUSICA provides different locations for data, there is a high-performance all-flash WekaIO parallel filesystem and a high-volume IBM Storage Scale Parallel Filesystem (aka GPFS) plus a node-local flash storage. They are accessible under:
-
WekaIO:
$SCRATCHexpands to/scratch/fs2XXXXXX/username -
IBM Spectrum Scale:
$HOMEexpands to/home/username
$DATAexpands to/data/fs2XXXXXX/username
where XXXXXX is the project number
- Local SSD
/local
Home Storage¶
$HOME is the location of the user UNIX home directory. It is entirely located on NVMe discs. Each user has their own home directory, independent of the project.
Quota on $HOME is 50GB and 106 number of files for every user. The storage size can not be extended but the number of files can be increased upon request.
Project Storage¶
$DATA is a tiered file system containing 1PB flash and around 10PB of HDD storage. It stores up to 10% of the data and all meta-data on NVMe discs and the rest on spinning discs. Frequently used files are automatically moved to the NVMe tier, while unused files are moved back to the HDD tier.
Quota on $DATA is 10TB and 10 million number of files for the project and can be extended up to 100TB.
If there is need for even more storage, specific arrangements can be made.
Access permissions
The files on $DATA are usually group read-/writable so project members can exchange data.
Check quota
You can check your current quota usage on each of the two storage systems with:
mmlsquota --block-size auto -j fs2XXXXX data_ess [or home_fs2XXXXX home]
Scratch Storage¶
$SCRATCH is a very fast all-flash system intended for scratch (non-longterm) data. It has a size of 4PB with a quota of 5TB for the project and no limit on the number of files. Extension is possible upon request.
Data retention
$SCRATCH is intended for temporary data and is therefore not backed up and may be cleared when space is needed. NOT INTENDED TO STORE DATA.
For running but not for storing¶
/local¶
/local is a locally mounted NVMe disk. The size is about 7TB on a GPU node and 2TB on a CPU-only node..
Data retention is not guaranteed, all data is lost on a reboot, so they need to be transferred after use.
/tmp or /dev/shm¶
Can be used for intensive I/O, and take up to half a compute node memory and the data resides in the shared memory (RAM) of the node.
The data gets deleted after the job, so move results to $DATA
Warning
/local and /tmp might be useful to run jobs but they are NOT for permanent storage.
Requesting extra storage¶
Submit your request via the project website.
What to store where¶
$HOME¶
User settings and various caches and config directories are automatically here. Additionally, $HOME can be used for custom configuration, code and scripts, custom software. Conda environments might get too big to be kept in $HOME.
It is NOT intended for scientific or research data storage. Heavy I/O to HOME should also be avoided.
$DATA¶
This is the main project volume, all scientific/research data must go here, including raw input data, final results and all types of intermediate data. Please remove any data (especially temporary or intermediate data) that is no longer in use - the system is not set up for long-term archiving.
$SCRATCH¶
Temporary data that is needed during jobs or that is created by jobs. Especially data that benefits from fast read or write. Data can be removed at any time, therefore results or other long-term data need to be moved out.