Why is my job not starting?¶
This is a very difficult question to answer, there could be many reasons!
Some of them are:
- There are jobs with higher priority ahead of you. New submitted jobs that have higher priority than yours will be placed ahead in the queue.
- Even if you see idle nodes with
sinfo
these might have been reserved by the scheduler for a large job to start. - Idle nodes may also be part of a private queue, where some nodes are always hold available for the owning group.
- You may have requested an impossible combination of resources node type/memory. In this case your job will never start.
- Your job requested more time than what is allowed by the QOS (check the available QoS VSC-5 QoS/VSC-4 QoS) and it will never start.
- You can check the meaning of the reason codes displayed by
squeue
here.
Warning
If your job has been in the queue for weeks, check that you requested resources appropriately.
- Nodes might be reserved for a particular event like training or courses for example.
- The cluster is down!
- Your project has run out of time.
- Your project has run out of core-h. In that case, you are put in one of the idle queue, which have extremely low priority.
Tip
Request resources efficiently. Do not ask for more nodes and time than you need. Larger and longer jobs will wait longer in the queue. If possible make tests to estimate the required resources. Use the devel QoS!