Skip to content

Quobyte

Quobyte is a fault-tolerant, high-performance parallel file system. There is an extensive whitepaper for those interested in the details.

Quobyte is currently available from both Farm and Hive. Farm storage can be purchase through Hippo. Hive storage can be purchased by contacting support with an amount to purchase and a campus chart string. Share names are normally the PI's Login ID followed by grp, e.g. jrigrp. For campus organizations, we can also create the share based on your lab.

Quobyte concurrent file writes from multiple nodes

Due to a known pathology in the Quobyte parallel file system, multiple nodes writing to the same file cause lock contention and eventually full blockage, preventing data from being written to that file. This prevents slurmd from being able to finalize those tasks, which causes the nodes to get kicked out of the cluster, which requires admin intervention to resolve.

Warning

Because this causes cluster-wide impact, jobs found doing this are subject to being killed and the user's account temporarily locked.

Unique filenames can be generated using a combination of the host name, and variables that Slurm sets.

For a normal job: OUTPUT_FILE="$(hostname)-${SLURM_JOB_ID}.results"

For an array job: OUTPUT_FILE="$(hostname)-${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID}.results"

If you are writing software, then you need to use file locks.

We have an open feature request to allow an error to be returned in these cases. In the meanwhile, please ensure you do not write to the same file from multiple nodes.