arthurguru: Rsyncing to AWS EFS

Rsyncing to AWS EFS

The rsync problem

Rsync over AWS EFS is painstakingly slow. This is because AWS EFS was designed to be massively parallel rather than efficient at consuming long queues of I/O requests sequentially. Unfortunately rsync is old school and operates in a very sequential fashion, and its poor interaction with AWS EFS can be seen with typical kernel iowait states of greater than 70%.

It's a shame the two technologies aren't good bedfellows as rsync is very smart at just sending down the change deltas and minimising network and disk traffic.

Wouldn't it be just grand if AWS EFS came with its own rsync daemon interface.

Alas, we just have to be a little more efficient with using rsync over AWS EFS which involves:

Breaking down big rsync jobs into smaller tasks that are run in parallel.

Choosing rsync options that are more sympathetic to AWS EFS.

NOTE: By default AWS EFS operates at the entry level tier which has a very low bandwidth and burst credit limit. You need to raise the tier to a more practical level for it to be useful which also corresponds to a significant increase in monthly charges - this is the price you pay for performant HA managed by someone else.

Rsync Basics

A typical rsync job spawns 3 processes, and in it's traditional source host invocation behaves like this.

generator - runs on the source machine and works out what needs doing (aka the "generating list" message).

sender - runs on the source machine and sends files and bits of files that have changed to the receiver.

receiver - runs on the remote machine and receives files and bits of files for updating.

When not using daemon mode then rsync communication between the generator, sender and receiver is via basic pipes. If using rsh or ssh then sockets are established between the two hosts as the transport layer, however rsync still communicates to its other processes via basic pipes. The generator process creates the file lists (known as basis files), and when a file is scheduled for an update the sender and receiver figure out which blocks need updating and what needs to be sent over the wire - this works extremely well for large files that have only changed a little.

Below is a simplified mud-map of how the rsync receiver source code behaves.
(based on rsync 3.0.6)
receiver.c
  recv_files()
    if (option:inplace) 
      fd2 = do_open(fname, O_WRONLY|O_CREAT, 0600);
      Target file fd handle for opening is created, but no writes occur just yet.
    else
      fd2 = open_tmpfile(fnametmp, fname, file);
      Temp file fd handle for opening is created, but no writes occurr just yet.

    Send data to file, but still not directly.
    recv_ok = receive_data(f_in, fnamecmp, fd1, st.st_size, fname, fd2, F_LENGTH(file));
    close(fd1)
    close(fd2)
    finish_transfer(fname, fnametmp, fnamecmp, partialptr, file, recv_ok, 1)
    do_unlink(fnametmp)

  receive_data(int f_in, char *fname_r, int fd_r, OFF_T size_r, const char *fname, int fd, OFF_T total_size)
    recv_token(f_in, &data)
      write_file(fd,data,i)

  open_tmpfile(char *fnametmp, const char *fname, struct file_struct *file)
    get_tmpname(char *fnametmp, const char *fname)
    do_mkstemp(fnametmp, file->mode & INITACCESSPERMS)

fileio.c
  write_file(int f, char *buf, int len) - unique block writing algorithm
    flush_write_file(int f)
      write(f, bp, wf_writeBufCnt) UNIX-3C

rsync.c
  finish_transfer()
    if (option:inplace) 
      fnametmp = fname
      goto x
    set_file_attrs(fnametmp, file, NULL, fnamecmp, ok_to_set_time ? 0 : ATTRS_SKIP_MTIME);
    ret = robust_rename(fnametmp, fname, temp_copy_name, file->mode & INITACCESSPERMS);
    x:
      set_file_attrs(fnametmp, file, NULL, fnamecmp, ok_to_set_time ? 0 : ATTRS_SKIP_MTIME);
      if (temp_copy_name)
        do_rename(fnametmp, fname)
      
util.c
  robust_rename()
    do_rename(from, to)
    if EXDEV External disk device
      copy_file(from, to)
      do_unlink(from)

  copy_file()
    ifd = do_open(source, O_RDONLY, 0)
    ofd = do_open(dest, O_WRONLY | O_CREAT | O_TRUNC | O_EXCL, mode)
    while ((len = safe_read(ifd, buf, sizeof buf)) > 0)
      full_write(ofd, buf, len)
    close(ifd)
    close(ofd)

  safe_read()
    n_chars = read(desc, ptr, len) UNIX-3C

  full_write()
    written = write(desc, ptr, len) UNIX-3C

syscall.c
  do_open()
    open(pathname, flags | O_BINARY, mode) UNIX-3C

  do_mkstemp()
    mktemp() UNIX-3C

  do_rename()
    rename(fname1, fname2) UNIX-3C

  do_unlink()
    unlink(fname) UNIX-3C
Enter AWS EFS (NFS 4.1)

AWS EFS is presented as a local NFS mount point so now the rsync receiver process will run on the source host. The efficient file transfer mechanisms used by rsync still work as expected, but are now generating extra I/O requests on the AWS EFS file system. If your files are big then this may still be advantageous, however if you have tens of thousands of small/medium files then this will be disadvantageous and start eating into both your AWS EFS bandwidth and burst credits or even adding to your kernel's iowait state.

To reduce the above impact you can use the following rsync options:

--whole-file - send the file whole and don't bother with figuring out which blocks have changed.

--inplace - update the file directly where it is.

--temp-dir - store temporary rsync working files in a user specified temp-dir.

You should also use another recommended processing optimisation for NFS (or in our case, AWS EFS)

--numeric-ids - prevents rsync from doing any uid mapping.

--omit-dir-times - depending on your circumstances this can also help reduce AWS EFS traffic.

The option --whole-file is dumbing down rsync, but who said NFS was smart. Thanks to the broken NFS option called "readdirplus" the task of getting file attributes for directories containing tens of thousands of files is overly chatty and will perform atrociously over a high latency network link. To give you some perspective, in our testing we found NFS to be 100 times slower than SSHFS under these circumstances, unfortunately AWS EFS doesn't have an SSHFS plugin for us (an rsync daemon interface would be better still).

The option --inplace is a little problematic with files that are in use. There is a very small chance that an end-user may query a file that rsync hasn't finished updating yet. Also, the rsync source code logic for updating a file using the --inplace option uses a different write method than using temporary work files. The --inplace write method appears to hold the file descriptor open for longer than necessary, is less atomic, and is optimised for updating bits of files, however because we are using the --whole-file option we really don't want all that overhead.

By not using the --inplace option rsync defaults to creating temporary work files adjacent to the files being updated. This is doubly bad for your AWS EFS bandwidth and burst credit limits. However, this is easily fixed by specifying the rsync --temp-dir option and pointing it to a local (non AWS EFS) file system. The rsync source code logic for updating files in these circumstances appears to become much simpler - the new updated file-to-be is first staged in the temp-dir and is then put in place whole using basic libc functions atomically. This appears to be more NFS (AWS EFS) sympathetic.

Strangely enough, with or without the --inplace option I still observed temp-dir files being created. This leaves me to believe that the --temp-dir option is a must for AWS EFS as you don't want these temporary rsync files eating into your AWS EFS bandwidth and burst credits or adding to your kernel's iowait state. There are some disk space and security issues with using temp-dir that are well documented in the rsync man page that you should be aware of.

Thus I've found the ideal rsync options for syncing to AWS EFS are:
--numeric-ids --omit-dir-times --whole-file --temp-dir="<local-file-system>"
This still won't yield super fast performance for rsync over AWS EFS, and if you have folders containing thousands of files then you will also be subjected to the slow performing NFS protocol with its "readdirplus" bug which is further exacerbated when run over a high latency network link. These options at the very least just provide you with the most optimised way of running an rsync from local disk to an AWS EFS destination. From this point onwards you have a sound basis for breaking down one big rsync job into many smaller parallel tasks.

As an aside, I also needed to introduce a process that purges stale temporary files in temp-dir.

Rsync corrupting files

In my rsyncing endeavours of syncing tens of thousands of changing files every hour to AWS EFS I noticed a few files would end up with corrupted file attributes. Often these files would have file permissions of "---------" (mode 000), be of zero bytes in size, and sometimes have corrupted uid/gid values and modification timestamps.

I am yet to figure out the real cause of this problem (rsync or AWS EFS). Fortunately the percentage of them occurring is very small, less than 0.01%, however once created they are a real pain to the systems that have to work with them.

The problem can often only be fixed by deleting the affected file as user root. This means I need to regularly scan the AWS EFS file system for these problem files and deal with them.

I use both the rsync --timeout option and a governing process to ensure my rsyncs complete within a specified time window. The governing process sends a kill -SIGUSR1 followed a few seconds later by a kill -SIGKILL. Unfortunately I've found kill -SIGUSR1 does not exit in a timely manner over AWS EFS, if at all.

The rsync source code shows that all temporary or not-yet-ready files are created with permissions of "-rw------" (mode 600) then changed to the correct mode after they have been put in place at the destination. This implies that the problem could be more atomic (e.g. kernel), or even with AWS EFS itself.

This is a challenging problem to debug as it involves rsync, NFS, kernel and AWS EFS - I have no visibility of how AWS EFS is implemented in AWS (actually I've been told but I'm not allowed to say). As an early adopter of AWS EFS I guess we just have to wait until more people start having these same kinds of problem.

arthurguru, 2017.