User Tools

Site Tools


filesystems:transfer

Transfer Options

This wiki page describes a number of data transfer options:

  • methods, respectively tools
  • file server addresses for those tools
  • brief examples

For details of the file system and practical considerations please refer to the relevant wiki page. For an even more practical approach and more information please consider following an introductory course.

Addresses

Addresses to connect with ssh/scp

In order to contact our HPC systems with ssh (used by scp and rsync) address; these are the same as described in our notes on login procedures:

  • mogon.zdv.uni-mainz.de or just mogon from within the university net for Mogon I
  • miil01.zdv.uni-mainz.de, miil02.zdv.uni-mainz.de or miil03.zdv.uni-mainz.de, resp. miil01, miil02 or miil03 from within the university net for Mogon II.

FTP addresses for Mogon I and II

Un-encrypted transfer of big data amounts is faster if transferred directly onto the file servers (see below). The addresses to be used with ftp for the two clusters are:

  • mogonfs.zdv.uni-mainz.de (Mogon I ) and
  • mogon2ftp.zdv.uni-mainz.de (Mogon II)

These addresses provide direct access to the file systems1) and are henceforth meant for direct, un-encrypted transfer via (l)ftp, resp. ftp via Filezilla (see below).

Using scp

scp stands for “secure copy” and works with ''ssh'' (the secure shell)

When to use scp

Use Cases Abuse Cases Note
transferring single / a few small files transferring big data
need to transfer files securely see below for safe FTP with FileZilla or with lftp

Basic Syntax

To use scp you need to define the source and the destination. Here, a host needs to be given for either one or both, source and destination.

Conventions

For the purpose of the following code examples below we will refer to

server=<some name of a server in your institutes basement>
mogon=<either mogon login access>

Example

# to copy a <file> in the current location to a <remote> host into the home directory ('.'):
scp <file> <remote>:.
 
# if the username is different on the remote, you need to supply it:
scp <file> <user>@<remote>:.
 
# if you want the file to go to a different location on the remote, you need to supply it:
scp <file> <remote>:/path/to/location/on/remote/.
 
# you may rename the file, as with the cp program:
scp <file> <remote>:<new_name>
 
# an entire directory and subdirectory tree can be copied recursively with -r:
scp -r <directory> <remote>:/path/to/location/on/remote/.

These examples should illustrate how copy data from a host to a remote. In order to retrieve data from a remote, the remote server has to take the place of the source and, for instance, the destination becomes your host, e. g.:

scp -r <remote>:/path/to/location/on/remote/  <path/on/your/machine>

You can invoke scp on Mogon I / II as the host and treat your destination as the remote, too.

Important Flags

FlagMeaning
-r Recursively copy entire directories. source and destination are considered directory trees. Also see cp.
-C Enables compression. Files are transferred in transferred mode - may or may not be faster

Direct copy

“Direct copy” means to copy from the machine you are logged on to and from Mogon.

Here, you can invoke scp for a single file:

scp <filename> <mogon>:/desired/path

or, using the -r flag an entire directory:

scp -r <directory> <mogon>:/desired/path

Copy from remote host to Mogon I/II and vice versa

The idea is to trigger a command on your desktop to transfer from a remote storage (e. g. an institutes server) directly to Mogon I or II without the need to transfer to your desktop and subsequently to Mogon I or II.

The command line will need an additional flag and the names of both servers:

scp $server:<path to copy from server> mogon:<destination path on mogon>
# example
scp $server:/data/scripts/script.sh mogon:./bin/.

In order to copy entire directories use the -r flag, see above.

If the server is not set up for a direct transfer (e.g. when the ssh-server is not set-up), the -3 flag can be used to transfer to the local host.

Using rsync

rsync is a utility to transfer and synchronize files between computers (or external drives).

When to use rsync

Use Cases Abuse Cases
need to synchronize directories no need to synchronize directories

Note:

  • The need to synchronize often arises in software development, when attempting to develop on a remote system and then transferring to an HPC in order to compile. This is an artificially imposed need. Consider to develop a HPC system and to use version management2) for your development cycle. This means shorter turn-around times and better testing opportunities.
  • Other needs might arise from project needs. In the case of big data which need to be in sync, please consider lftp.

Basics

Using rsync is straight forward and best shown by a simple example:

Assuming you want to synchronize a <source> directory on your desktop with a known destination (a path) on <mogon> (here, <mogon> stands for the address to be used with ''ssh''/''scp'' as rsync uses ssh.

rsync -avzh <source> <mogon>:<destination>

Here,

  • -a is for archive, meaning it will preserve all the permissions , links , dates etc.
  • -v is for “verbosely report what you are doing” (can frequently be omitted)
  • -z is used for compressing of data during the transfer (may or may not be faster, see scp-comment)
  • -h ask to display all output in a human readable format.

Using lftp

What is lftp ?

lftp is a command-line program client for several file transfer protocols. This article intends to provide information for a quick start, a comprehensive manual is provided in the man page.

When to use lftp

Use Cases Abuse Cases
fast transfer of big data need to transfer only a few small files
- need to transfer only a few small files

Getting lftp

If not already installed:

  • On debian, centos and ubuntu the package is lftp
  • On Mac OSX lftp can be installed with brew (brew install lftp)

Basics

Here, the same address apply as given above. <mogon> will be the placeholder for either one of them. <username> for your username, etc..

To connect with lftp you need to invoke the program and state the desired host to connect to. A password will be asked and you can quit the session with the quit or by typing CTRL-D or exit command:

$ lftp -u <username> <mogon>
Password:
lftp <username>@<mogon>:~> quit

More commands to be used within a lftp session:

PurposeCommandNotes
listing remote files ls -
listing local files !dir -
show current remote directory pwd -
show current local directory lpwd -
change remote directory cd same as shell command
change local directory lcd much like the shell cd command
remove a remote file rm does not understand wildcards
remove multiple remote files mrm much like the shell rm command

Examples

Transferring Single Files

This first example assumes that you have files with a .dat suffix in either a folder on your host or on the HPC system. It attempts to transfer them to the HPC system (with mput) or retrieve them (with mget). To

$ lftp -u <username> <mogon>
Password:
lftp <username>@<mogon>:~> mput *dat
xxx bytes transferred                             
Total n files transferred
lftp <username>@<mogon>:~> mget *dat
xxx bytes transferred                             
Total n files transferred
lftp <username>@<mogon>:~> quit

You can specify a base directory or URL where files should be placed with -O, e.g. mput -O /path/to/dir *dat.

Retrieving all files from mogonfs from the command line, can be accomplished with this one-liner:

$ lftp -u <username> -e "mget *dat" <mogon>
Password: 
xxx bytes transferred                             
Total n files transferred                                    
lftp <username>@<mogon>:~> quit

With knowledge of the directory this one-liner can get the data from within that directory:

$ lftp -u <username> -e "mget /gpfs/fs2/project/<projectname>/somedir/*dat" <mogon>
xxx bytes transferred                                                            
Total n files transferred
lftp <username>@<mogon>:~> quit

Transferring Directories

In order to mirror a directory to/from our HPC systems, the mirror command can do this. mirror attempts to mirror the specified source directory to the local target directory:

  • to transfer the local current directory, if it does not exist on the remote host:
lftp <username>@<mogon>:~> mirror -R      
Total: 1 directory, y files, z symlinks             
New: x files, < symlinks
To be removed: x directories, y files, z symlinks
 
  • As the output shows, you can use mirror to clean up transferred files, too, using the -e option. However, the behavior may be confusing and should be tested.
  • Resuming interrupted transfers can be done with -c.
  • To dereference (and copy) symbolic links as files supply -L.
  • mirror can be used in parallel, too: e.g. –parallel=4 will attempt to transfer 4 files in parallel – useful to saturate the bandwidth in case of smaller files.

Using Script Files

lftp commands can be put into script files. This is useful when repeating the same transfers often.

  • $ lfpts -e <cmd> can be used to carry out one (or a few) commands in a script.
  • $ lfpts -f <scriptfile> can be used to run a <scriptfile> containing (preferably tested) commands

Avoiding Password based Logins

Setting bookmarks is the solution to avoid the need to type your password over and over again:

$ lftp <username>@<mogon>
Password:
lftp <username>@<mogon>:~> set bmk:save-passwords true
lftp <username>@<mogon>:~> bookmark add <mogon>
lftp <username>@<mogon>:~> bookmark list
<mogon> ftp://<username>:XXXX@mogonfs.zdv.uni-mainz.de/
lftp <username>@<mogon>:~> exit

This should then work:

$ lftp <mogon>

Whilst practical, this will store your clear-text password on the computer where you save the bookmark!

Using FileZilla

You can access the home and project directories via FTP (even from outside the university network) by connecting to <mogon> (with mandatory TLS encryption3)).

When using FileZilla, please select FileSitemanager (DateiServermanager in German) to reach the dialog below, which shows a sample configuration.

Exemplary configuration of FileZilla

Please Note:

  • Always use the “FTP over TLS” if you want a secure connection.
  • FileZilla ask for confirmation, the first time you connect. Please confirm and set the tickmark to trust the server for future connection attempts - then no subsequent confirmation is necessary.

Windows Share (SMB/CIFS)

You can access the home and project directories from a Windows machine (within the university network) under the following URL: \\<mogon>\ or by explicitly connecting to \\<mogon>\<username> or \\<mogon>\project. This image shows an example for the user schlarbm:

Exemplary navigation bar of Windows Explorer

Using SSHFS

sshfs is way to mount the file system of Mogon4) on your local (linux) desktop. While it is better to log in for most purposes, when developing on your local desktop it offers to possibility to compile directly on Mogon.

We offer IDEs and other development tools directly on Mogon, too.

The following instructions assume you are using a Debian-like system (e.g. the distro as by the ZDV). For other distributions similar options apply.

First you need to install sshfs, if not already present:

# apt install sshfs

As root you need to create a mount point5):

# mkdir /mnt/mogon

Subsequently, again whilst being root, start sshfs:

# sshfs -o allow_other <username>@mogon:. /mnt/mogon

Note: This will mount your home directory (:.), alternatively other mount points are valid.

Now, for example, you can list your home directory on your local machine:

$ ls /mnt/mogon/

Or you can instruct your IDE to use a directory there …

To unmount perform:

# umount /mnt/mogon

In order to make this permanently, edit /etc/fstab and add:

sshfs#<username>@mogon:. /mnt/mogon

Efficiency Considerations

Why should you bother, rsync with scp works reasonably well after all, doesn't it?

For the following example we created files of 1 GiB size with dd if=/dev/urandom of=1GB.bin bs=64M count=16 iflag=fullblock of=sample_${number}.txt. These files were transferred to Mogon I in one of two ways:

  • either by means of scp via one of the login-nodes on Mogon I. The call was scp <fname> mogon:<testfolder> to copy one file to Mogon6). Copying a file to a local disk from Mogon I was done by changing source and destination to the scp command.
  • or by means of lftp. The command was lftp -u <username> -e “cd <testfolder>; put <fname>” mogonfs.zdv.uni-mainz.de. Likewise, copying a file onto a local disk from Mogon I was done by changing put to get.

To copy multiple files and asterisk (*) was used for scp and an asterisk together with mput/mget for lftp. Also, the testfolder was a project folder. Each transfer type was tested thrice.

For testing to transfer 100 GiB 100 files of 1 GiB were created and transferred.

As the plotted results show lftp outperforms scp by 25 - 30 %. While the actual variability may exceed the measured, lftp is consistently above 100 MB / s.

No compression was tested, as the data were random, however actual files (particularly text files) can profit a lot from switching on compressions upon transfer.

When scp, when lftp?

scp is more convenient. lftp is faster, due to omitting encryption. So, scp is fine, also in combination with rsync for small to medium sized files. Here, the difference is that for 100 files of 1 GiB scp took more than 24 min. lftp only little more than a quarter of an hour.

Get in touch with us, if your bandwith is substantially lower than the values reported here!

1)
in case of Mogon II still via login-nodes
4)
I or II
5)
The name mogon is merely a suggestion - you may choose a different name, of course.
6)
Here, we assume that ssh was configured as described in our wiki.
filesystems/transfer.txt · Last modified: 2019/06/11 11:19 by meesters