Quobyte
Quobyte is a fault-tolerant, high-performance parallel file system. There is an extensive whitepaper for those interested in the details.
Quobyte is currently available from both Farm and Hive. Farm storage can be purchase through
Hippo. Hive storage can be purchased by contacting
support with an amount to purchase and a campus chart string. Share names are normally the PI's Login
ID followed by grp, e.g. jrigrp. For campus organizations, we can also create the share based on your lab.
Quobyte concurrent file writes from multiple nodes¶
Due to a known pathology in the Quobyte parallel file system, multiple nodes writing to the same file cause lock
contention and eventually full blockage, preventing data from being written to that file. This prevents slurmd from
being able to finalize those tasks, which causes the nodes to get kicked out of the cluster, which requires admin
intervention to resolve.
Warning
Because this causes cluster-wide impact, jobs found doing this are subject to being killed and the user's account temporarily locked.
Unique filenames can be generated using a combination of the host name, and variables that Slurm sets.
For a normal job: OUTPUT_FILE="$(hostname)-${SLURM_JOB_ID}.results"
For an array job: OUTPUT_FILE="$(hostname)-${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID}.results"
If you are writing software, then you need to use file locks.
We have an open feature request to allow an error to be returned in these cases. In the meanwhile, please ensure you do not write to the same file from multiple nodes.