• Adrian Reber's avatar
    Try to include userfaultfd with criu (part 2) · 57891afc
    Adrian Reber authored
    This is a first try to include userfaultfd with criu. Right now it
    still requires a "normal" checkpoint. After checkpointing the
    application it can be restored with the help of userfaultfd.
    
    All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as
    being handled by userfaultfd.
    
    As soon as the process is restored it blocks on the first memory access
    and waits for pages being transferred by userfaultfd.
    
    To handle the required pages a new criu command has been added. For a
    userfaultfd supported restore the first step is to start the
    'lazy-pages' server:
    
      criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket
    
    This waits on a unix domain socket (defined using the --address option)
    to receive a userfaultfd file descriptor from a '--lazy-pages' enabled
    'criu restore':
    
      criu restore -D /tmp/3 -j -v4 --lazy-pages \
      --address /tmp/userfault.socket
    
    In the first step the VDSO pages are pushed from the lazy-pages server
    into the restored process. After that the lazy-pages server waits on the
    UFFD FD for a UFFD requested page. If there are no requests received
    during a period of 5 seconds the lazy-pages server switches into a mode
    where the remaining, non-transferred pages are copied into the
    destination process. After all remaining pages have been copied the
    lazy-pages server exits.
    
    The first page that usually is requested is a VDSO page. The process
    currently used for restoring has two VDSO pages, but only one is
    requested
    via userfaultfd. In the second part where the remaining pages are copied
    into the process, the second VDSO page is also copied into the process
    as it has not been requested previously. Unfortunately, even as this
    page has not been requested before, it is not accepted by userfaultfd.
    EINVAL is returned. The reason for EINVAL is not understood and
    therefore
    the VDSO pages are copied first into the process, then switching to
    request
    mode and copying the pages which are requested via userfaultfd. To
    decide at which point the VDSO pages can be copied into the process, the
    lazy-pages server is currently waiting for the first page requested via
    userfaultfd. This is one of the VDSO pages. To not copy a page a second
    time, which is unnecessary and not possible, there is now a check to see
    if the page has been transferred previously.
    
    The use case to use usefaultfd with a checkpointed process on a remote
    machine will probably benefit from the current work related to
    image-cache and image-proxy.
    
    For the final implementation it would be nice to have a restore running
    in uffd mode on one system which requests the memory pages over the
    network from another system which is running 'criu checkpoint' also in
    uffd mode. This way the pages need to be copied only 'once' from the
    checkpoint process to the uffd restore process.
    
    TODO:
        * Contains still many debug outputs which need to be cleaned up.
        * Maybe transfer the dump directory FD also via unix domain sockets
          so that the 'uffd'/'lazy-pages' server can keep running without
          the need to specify the dump directory with '-D'
        * Keep the lazy-pages server running after all pages have been
          transferred and start waiting for new connections to serve.
        * Resurrect the non-cooperative patch set, as once the restored task
          fork()'s or calls mremap() the whole thing becomes broken.
        * Figure out if current VDSO handling is correct.
        * Figure out when and how zero pages need to be inserted via uffd.
    
    v2:
        * provide option '--lazy-pages' to enable uffd style restore
        * use send_fd()/recv_fd() provided by criu (instead of own
          implementation)
        * do not install the uffd as service_fd
        * use named constants for MAP_ANONYMOUS
        * do not restore memory pages and then later mark them as uffd
          handled
        * remove function find_pages() to search in pages-<id>.img;
          now using criu functions to find the necessary pages;
          for each new page search the pages-<id>.img file is opened
        * only check the UFFDIO_API once
        * trying to protect uffd code by CONFIG_UFFD;
          use make UFFD=1 to compile criu with this patch
    
    v3:
       * renamed the server mode from 'uffd' -> 'lazy-pages'
       * switched client and server roles transferring the UFFD FD
         * the criu part running in lazy-pages server mode is now
           waiting for connections
         * the criu restore process connects to the lazy-pages server
           to pass the UFFD FD
       * before UFFD copying anything else the VDSO pages are copied
         as it fails to copy unused VDSO pages once the process is running.
         this was necessary to be able to copy all pages.
       * if there are no more UFFD messages for 5 seconds the lazy-pages
         server switches in copy mode to copy all remaining pages, which
         have not been requested yet, into the restored process
       * check the UFFDIO_API at the correct place
       * close UFFD FD in the restorer to remove open UFFD FD in the
         restored process
    
    v4:
        * removed unnecessary madvise() calls ; it seemed necessary when
          first running tests with uffd; it actually is not necessary
        * auto-detect if build-system provides linux/userfaultfd.h
          header.
        * simplify unix domain socket setup and communication.
        * use --address to specify the location of the used
          unix domain socket.
    
    v5:
        * split the userfaultfd patch in multiple smaller patches
        * introduced vma_can_be_lazy() function to check if a page
          can be handled by uffd
        * moved uffd related code from cr-restore.c to uffd.c
        * handle failure to register a memory page of the restored process
          with userfaultfd
    
    v6:
        * get PID of to be restored process from the 'criu restore' process;
          first the PID is transferred and then the UFFD
    Signed-off-by: 's avatarAdrian Reber <areber@redhat.com>
    Signed-off-by: 's avatarPavel Emelyanov <xemul@virtuozzo.com>
    57891afc
Name
Last commit
Last update
Documentation Loading commit data...
compel Loading commit data...
contrib Loading commit data...
coredump Loading commit data...
crit Loading commit data...
criu Loading commit data...
images Loading commit data...
include/common Loading commit data...
lib Loading commit data...
scripts Loading commit data...
soccr Loading commit data...
test Loading commit data...
.gitignore Loading commit data...
.mailmap Loading commit data...
.travis.yml Loading commit data...
COPYING Loading commit data...
CREDITS Loading commit data...
INSTALL.md Loading commit data...
Makefile Loading commit data...
Makefile.compel Loading commit data...
Makefile.config Loading commit data...
Makefile.install Loading commit data...
Makefile.versions Loading commit data...
README.md Loading commit data...