Data Transfer
There are four general methods for getting data to/from a cluster.
Globus¶
Farm, Franklin, and Hive have Globus installed. Home directories for all three clusters are already exported. Because of the way Globus v5 works, each PI directory must be exported manually by HPCCF staff. If you need a PI directory exported, please contact HPC support and CC your PI for approval.
Once the PI group directory is exported, you will be able to read any file you normally have access to, but for security
reasons, you will only be able to write files to /globus-write/Your-Login-ID/
within the Globus File Manager.
Home directories, and PI group directories can be access through the Globus File Manager. For
home directories, search for a collection named UC Davis CLUSTERNAME home
. For PI group directories, search for
UC Davis CLUSTERNAME PI-name
.
Globus Free File Transfer limitations
HPCCF does not have a paid subscription for Globus, and uses the Free File Transfer…for users at non-profit research institutions
tier. This means you either need to have a login on both ends of the transfer, or the remote end must have the paid version.
Command-line tools¶
scp — OpenSSH secure file copy¶
If you only need to move a single file, or a small directory, scp will work well. From your desktop/laptop you can push, or pull, file(s) and/or directories.
- To push a file, you specify the source from your local system, and the destination on a cluster:
-
scp -rp local-file local-directory/ UCD-Login-ID@CLUSTERNAME.hpc.ucdavis.edu:
- To pull a file from a cluster to your local machine:
-
scp -rp UCD-Login-ID@CLUSTERNAME.hpc.ucdavis.edu:location/on/cluster .
The final .
will put the files/directories into the current directory.
scp
arguments:
-
-r
Recursively copy entire directories. Note that scp follows symbolic links encountered in the tree traversal. -
-p
Preserves modification times, access times, and file mode bits from the source file.
See man scp
for the full manual.
rsync - a fast, versatile, remote (and local) file-copying tool¶
- From the
rsync
manual: -
Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
Like scp, rsync can operate in a push, or a pull, mode, depending on which arguments you put first. To use rsync to copy files from your local system, to a cluster, preserving permissions, ownership, and file times, you can use the following command.
rsync --archive --one-file-system --info=progress2 local-directory/ UCD-LOGIN-ID@CLUSTERNAME.hpc.ucdavis.edu:~/destination/
To pull a directory from a cluster to your local machine:
rsync --archive --one-file-system --info=progress2 UCD-LOGIN-ID@CLUSTERNAME.hpc.ucdavis.edu:~/remote-directory/ local-directory/
Explanation of options:¶
--archive
preserves symbolic links, permissions, ownership, file timestamps, and recursively copies directories.--one-file-system
don't cross filesystem boundaries.--info=progress2
show reasonable progress as rsync processes files.
Note: the trailing /
is critical on both the source and the destination, otherwise the files will not end up in the
location you are expecting.
If you want to make the destination directory exactly the same as the source directory, i.e., delete files in the
destination directory that do not exist on the source, you can add the --delete-after
argument to rsync.
--delete-during
: Deletes files on the destination that are not present on the source.
PERMANENT DATA LOSS
If you add the --delete-after
argument to rsync, but use the wrong source or destination directory in the rsync command, YOU MAY CAUSE PERMANENT DATA LOSS on the destination side. Please be extra cautious when having rsync delete files.
As always, see man rsync
for the manual.
Open OnDemand¶
For clusters that have Open OnDemand, you can copy files to/from your home directory. Log in to
the OnDemand dashboard for the cluster, and select Files
-> Home Directory
. Files larger than a couple hundred
megabytes will fail to transfer through OnDemand, so you will need to use a different method.
GUI tools¶
GUI tools do exist to help with file transfer. However, HPCCF cannot provide support for them, so you will need to contact your local IT support, or make an appointment with DataLab for help.
-
Filezilla is a multi-platform client commonly used to transfer data to and from the cluster.
-
Cyberduck is another popular file transfer client for Mac or Windows computers.
-
WinSCP is Windows-only file transfer software.