Skip to content

Data Transfer

There are four general methods for getting data to/from a cluster.

Globus

Farm, Franklin, and Hive have Globus v5 installed, use the Globus File Manager to access exported collections.

Home directories

Home directories for all three clusters are already exported. On Globus, you can find the home directory collection by searching for UC Davis CLUSTER-NAME home.

PI directories

Because of the way Globus v5 works, each PI directory must be exported manually by HPC@UCD staff. If you need a PI directory exported, please contact HPC support and CC your PI for approval.

Once the PI group directory is exported, you can find it by searching for a collection named UC Davis CLUSTER-NAME PI-share-name. In the Globus File Manager for that collection, you will be able to read any file you normally have access to, but for security reasons, you will only be able to write files to /globus-write/Your-Login-ID/. On the cluster, your newly written data will be under your PI's storage.

Farm, Franklin:

/group/Your-PI-Group-grp/globus-write/Your-Login-ID/

Hive:

/quobyte/Your-PI-Group-grp/globus-write/Your-Login-ID/

Globus Free File Transfer limitations

HPC@UCD does not have a paid subscription for Globus, and uses the Free File Transfer…for users at non-profit research institutions tier. This means you either need to have a login on both ends of the transfer, or the remote end must have the paid version.

Command-line tools

scp — OpenSSH secure file copy

If you only need to move a single file, or a small directory, scp will work well. From your desktop/laptop you can push, or pull, file(s) and/or directories.

To push a file, you specify the source from your local system, and the destination on a cluster:

scp -rp local-file local-directory/ UCD-Login-ID@CLUSTERNAME.hpc.ucdavis.edu:

To pull a file from a cluster to your local machine:

scp -rp UCD-Login-ID@CLUSTERNAME.hpc.ucdavis.edu:location/on/cluster .

The final . will put the files/directories into the current directory.

scp arguments:

  • -r Recursively copy entire directories. Note that scp follows symbolic links encountered in the tree traversal.

  • -p Preserves modification times, access times, and file mode bits from the source file.

See man scp for the full manual.

rsync - a fast, versatile, remote (and local) file-copying tool

From the rsync manual:

Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

Like scp, rsync can operate in a push, or a pull, mode, depending on which arguments you put first. To use rsync to copy files from your local system, to a cluster, preserving permissions, ownership, and file times, you can use the following command.

rsync --archive --one-file-system --info=progress2 local-directory/ UCD-LOGIN-ID@CLUSTERNAME.hpc.ucdavis.edu:~/destination/

To pull a directory from a cluster to your local machine:

rsync --archive --one-file-system --info=progress2 UCD-LOGIN-ID@CLUSTERNAME.hpc.ucdavis.edu:~/remote-directory/ local-directory/

Explanation of options:

  • --archive preserves symbolic links, permissions, ownership, file timestamps, and recursively copies directories.
  • --one-file-system don't cross filesystem boundaries.
  • --info=progress2 show reasonable progress as rsync processes files.

Note: the trailing / is critical on both the source and the destination, otherwise the files will not end up in the location you are expecting.

If you want to make the destination directory exactly the same as the source directory, i.e., delete files in the destination directory that do not exist on the source, you can add the --delete-after argument to rsync.

  • --delete-during: Deletes files on the destination that are not present on the source.
PERMANENT DATA LOSS

If you add the --delete-after argument to rsync, but use the wrong source or destination directory in the rsync command, YOU MAY CAUSE PERMANENT DATA LOSS on the destination side. Please be extra cautious when having rsync delete files.

As always, see man rsync for the manual.

rclone for UC Davis Box - transfer or sync data to/from the UC Davis Box client

As this process requires launching a web browser for OAuth authorization, we recommend using a Desktop session through Open OnDemand for the initial setup process

  1. Load the module

    module load rclone

  2. Create a Box App Token

    Box uses OAuth2 for authentication. You need to create an app in the Box developer console.

    1. Go to https://app.box.com/developers/console.
    2. Click Create Platform App → Custom App
    3. Give it a name, select the Purpose (suggest:Automation) and create it.
    4. Select User Authentication (OAuth 2.0)
    5. In the app settings:
    6. Add http://localhost:53682/ AND http://127.0.0.1:53682/ to Redirect URIs (rclone’s local auth server).
    7. Enable BOTH Read AND Write all files and folders
    8. Save changes.
  3. Configure rclone for Box

    Run: rclone config, follow the prompts:

    1. n) New remote
    2. name> mybox
    3. Storage> box

    When asked:

    • Client ID → Enter from Box app settings.
    • Client Secret → Enter from Box app settings.
    • Box Config File → leave blank
    • Access Token → leave blank
    • Box Sub Typeuser (1)
    • Edit advanced config? → No (unless you need special settings).
    • Use auto config? → Yes (if running locally with a browser).

    A browser will open for Box login

    1. click Sign in with SSO
    2. enter your @ucdavis.edu email address
    3. approve access
    4. rclone saves the token.
    5. Quit the rclone setup process
  4. Test the Connection: rclone ls mybox:

    This lists files in your Box root.

  5. Common Commands

    • Upload a file: rclone copy ./localfile.txt mybox:/BackupFolder
    • Download a file: rclone copy mybox:/BackupFolder/file.txt ./localdir
    • Sync a folder (mirror changes): rclone sync ./localdir mybox:/RemoteDir --progress
  6. Automating Backups

    Example cron job to sync daily at 2 AM:

    0 2 * * * $HOME/bin/rclone-sync.sh

    Then the contents of $HOME/bin/rclone-sync.sh look like this. You will need to adjust the local and remote directories.

    #!/usr/bin/bash
    
    source /etc/profile.d/modules.sh
    
    module load rclone
    
    rclone sync $HOME/data mybox:/Backup --log-file=$HOME/rclone-$(date -Im).log --log-level INFO
    

    If you require help with cron, please contact your local IT support. HPC@UCD is unable to help with this type of support.

Open OnDemand

For clusters that have Open OnDemand, you can copy files to/from your home directory. Log in to the OnDemand dashboard for the cluster, and select Files -> Home Directory. Files larger than a couple hundred megabytes will fail to transfer through OnDemand, so you will need to use a different method.

GUI tools

GUI tools do exist to help with file transfer. However, HPC@UCD cannot provide support for them, so you will need to contact your local IT support, or make an appointment with DataLab for help.

  • Filezilla is a multi-platform client commonly used to transfer data to and from the cluster.

  • Cyberduck is another popular file transfer client for Mac or Windows computers.

  • WinSCP is Windows-only file transfer software.