Commits · f68e5a6b3dd669cafab3cd3e5c556363d6ac357b · zhul / criu

16 Sep, 2017 27 commits

criu: always enable the userfaultfd support · f68e5a6b

Andrew Vagin authored Apr 27, 2016

Add linux/userfaultfd.h to criu sources. This header is a part
of the kernel API and I see nothing wrong to have in the repo.

Why we want to do this:
* to check that criu works correctly if a kernel doesn't
  support userfaultfd.
* to check compilation of the userfaultfd part in travis-ci.

v2: remove UFFD from FEATURES_LIST
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

f68e5a6b

lazy-pages: allow handling multiple processes · cefb69c9

Mike Rapoport authored Apr 18, 2016

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

cefb69c9

lazy-pages: move most of lazy_pages_info initialization to ud_open · 7f0b0754
Mike Rapoport authored Apr 18, 2016
```
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
```
7f0b0754

lazy-pages: use epoll instead of select · a20e5305

Mike Rapoport authored Apr 18, 2016

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

a20e5305

lazy-pages: introduce struct lazy_pages_info · b9596b56

Mike Rapoport authored Apr 18, 2016

for holding state related to userfaultfd handling
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

b9596b56

lazy-pages: introduce lazy_pages_summary · 6490d5d3

Mike Rapoport authored Apr 18, 2016

It verifies that amount of collected and transferred pages is consitent
and prints a summary
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

6490d5d3

lazy-pages: introduce handle_user_fault · 0b901451

Mike Rapoport authored Apr 18, 2016

It will handle page fault notifications from userfaultfd
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

0b901451

UFFD related makefiles cleanup · 10ef4276

Mike Rapoport authored Apr 18, 2016

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

10ef4276

lazy-pages: initialize process tree early · 7c78a48b

Mike Rapoport authored Apr 11, 2016

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

7c78a48b

lazy-pages: refactor unix socket initializaton · 1c54c003

Mike Rapoport authored Apr 11, 2016

so that listenning file descriptor might be used in select/poll
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

1c54c003

lazy-pages: always compile uffd.c · d08ea98b

Mike Rapoport authored Apr 11, 2016

If CONFIG_HAS_UFFD is not defined an attempt to run the lazy pages daemon
will result in error message
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

d08ea98b

Rename uffd_listen to cr_lazy_pages · 34693cb4

Mike Rapoport authored Apr 11, 2016

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

34693cb4

ppc64le: fix build with UFFD · 8337e998

Mike Rapoport authored Apr 12, 2016

The __u64 is 'unsigned long' on Power and 'unsigned long long' on x86_64.
Using PRI?64 does not help because, for instance, PRIu64 is 'lu'.

According to [1] the solution is to define __SANE_USERSPACE_TYPES__ for
Power builds

[1] http://thread.gmane.org/gmane.linux.kernel/1425475/focus=1427433Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

8337e998

uffd: add handling of zero pages · 6273fc97

Mike Rapoport authored Mar 31, 2016

When get_page returns 0, it means that a page is mapped by a vma but it is
not found in the pagemap. This happens when a page is a zero page and
threofre skipped by dump.
Use UFFDIO_ZEROMAP to create a zero page in the restored process address
space.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

6273fc97

uffd: introduce uffd_handle_page · 58edba63

Mike Rapoport authored Mar 31, 2016

so that it'll be able to handle both UFFDIO_COPY and UFFDIO_ZEROPAGE
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

58edba63

uffd: increment uffd_copied_pages only in one place · e3f05ea0

Mike Rapoport authored Mar 31, 2016

The uffd_copied_pages can be incremented in uffd_copy_page function rather
than in its callers
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

e3f05ea0

uffd.c: move the code out of the 'main' function · a7004002

Adrian Reber authored Mar 24, 2016

Most of the UFFD logic was in the function uffd_listen() which was
directly called from crtools.c. In preparation for the remote lazy
restore most of the code has been moved to separate function for better
integration of the network functionality.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

a7004002

uffd.c: make some variable static global · b3ae1cc2

Adrian Reber authored Mar 24, 2016

To better track how many pages have been handled by UFFD a few variables
have been made static global to easier access them and to reduce the
number of parameters passed around.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

b3ae1cc2

uffd.c: move code into subfunctions · 048b31b2

Adrian Reber authored Mar 24, 2016

uffd_listen() is a rather large function and this starts to move code
into subfunctions.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

048b31b2

uffd.c: remove unused variable vma_size · c3abfff0

Adrian Reber authored Mar 24, 2016

The variable vma_size was used for early debugging of lazy restore and
has no significance now.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

c3abfff0

uffd: remove handling of VDSO pages · fcaf36f5

Mike Rapoport authored Mar 23, 2016

Since VDSO pages cannot be lazy, no need to take care of them in lazy-pages
daemon.
Signed-off-by: Mike Rapoport <rapoport@il.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

fcaf36f5

uffd: do not treat VDSO pages as lazy · 9c7970c2

Mike Rapoport authored Mar 23, 2016

VDSO is just a few pages and they can be loaded directly rather than go
through userfaultfd to save some complexity on the lazy-pages daemon side.
Signed-off-by: Mike Rapoport <rapoport@il.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

9c7970c2

uffd.c: do not call unneeded functions · 33696ceb

Adrian Reber authored Mar 18, 2016

For a lazy restore via userfaultfd the lazy-pages daemon
needs to know which pages exist, so that it knows when all
pages have finally been migrated so that the restored process
has all of its memory. Therefore it needs to know which pages
exist and it needs to parse the files in the dump result directory.

The existing criu functions are designed to be used by a 'normal'
restore and thus a lot of assumptions are made what has to be set up.

For the lazy-pages restore the complete 'restore' initialization is
not necessary and therefore the criu common code dependencies are
minimized with this commit.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

33696ceb

Fix userfaultfd code with newer compilers · b8f46c36

Adrian Reber authored Mar 17, 2016

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

b8f46c36

Try to include userfaultfd with criu (part 2) · 57891afc

Adrian Reber authored Mar 15, 2016

This is a first try to include userfaultfd with criu. Right now it
still requires a "normal" checkpoint. After checkpointing the
application it can be restored with the help of userfaultfd.

All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as
being handled by userfaultfd.

As soon as the process is restored it blocks on the first memory access
and waits for pages being transferred by userfaultfd.

To handle the required pages a new criu command has been added. For a
userfaultfd supported restore the first step is to start the
'lazy-pages' server:

  criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket

This waits on a unix domain socket (defined using the --address option)
to receive a userfaultfd file descriptor from a '--lazy-pages' enabled
'criu restore':

  criu restore -D /tmp/3 -j -v4 --lazy-pages \
  --address /tmp/userfault.socket

In the first step the VDSO pages are pushed from the lazy-pages server
into the restored process. After that the lazy-pages server waits on the
UFFD FD for a UFFD requested page. If there are no requests received
during a period of 5 seconds the lazy-pages server switches into a mode
where the remaining, non-transferred pages are copied into the
destination process. After all remaining pages have been copied the
lazy-pages server exits.

The first page that usually is requested is a VDSO page. The process
currently used for restoring has two VDSO pages, but only one is
requested
via userfaultfd. In the second part where the remaining pages are copied
into the process, the second VDSO page is also copied into the process
as it has not been requested previously. Unfortunately, even as this
page has not been requested before, it is not accepted by userfaultfd.
EINVAL is returned. The reason for EINVAL is not understood and
therefore
the VDSO pages are copied first into the process, then switching to
request
mode and copying the pages which are requested via userfaultfd. To
decide at which point the VDSO pages can be copied into the process, the
lazy-pages server is currently waiting for the first page requested via
userfaultfd. This is one of the VDSO pages. To not copy a page a second
time, which is unnecessary and not possible, there is now a check to see
if the page has been transferred previously.

The use case to use usefaultfd with a checkpointed process on a remote
machine will probably benefit from the current work related to
image-cache and image-proxy.

For the final implementation it would be nice to have a restore running
in uffd mode on one system which requests the memory pages over the
network from another system which is running 'criu checkpoint' also in
uffd mode. This way the pages need to be copied only 'once' from the
checkpoint process to the uffd restore process.

TODO:
    * Contains still many debug outputs which need to be cleaned up.
    * Maybe transfer the dump directory FD also via unix domain sockets
      so that the 'uffd'/'lazy-pages' server can keep running without
      the need to specify the dump directory with '-D'
    * Keep the lazy-pages server running after all pages have been
      transferred and start waiting for new connections to serve.
    * Resurrect the non-cooperative patch set, as once the restored task
      fork()'s or calls mremap() the whole thing becomes broken.
    * Figure out if current VDSO handling is correct.
    * Figure out when and how zero pages need to be inserted via uffd.

v2:
    * provide option '--lazy-pages' to enable uffd style restore
    * use send_fd()/recv_fd() provided by criu (instead of own
      implementation)
    * do not install the uffd as service_fd
    * use named constants for MAP_ANONYMOUS
    * do not restore memory pages and then later mark them as uffd
      handled
    * remove function find_pages() to search in pages-<id>.img;
      now using criu functions to find the necessary pages;
      for each new page search the pages-<id>.img file is opened
    * only check the UFFDIO_API once
    * trying to protect uffd code by CONFIG_UFFD;
      use make UFFD=1 to compile criu with this patch

v3:
   * renamed the server mode from 'uffd' -> 'lazy-pages'
   * switched client and server roles transferring the UFFD FD
     * the criu part running in lazy-pages server mode is now
       waiting for connections
     * the criu restore process connects to the lazy-pages server
       to pass the UFFD FD
   * before UFFD copying anything else the VDSO pages are copied
     as it fails to copy unused VDSO pages once the process is running.
     this was necessary to be able to copy all pages.
   * if there are no more UFFD messages for 5 seconds the lazy-pages
     server switches in copy mode to copy all remaining pages, which
     have not been requested yet, into the restored process
   * check the UFFDIO_API at the correct place
   * close UFFD FD in the restorer to remove open UFFD FD in the
     restored process

v4:
    * removed unnecessary madvise() calls ; it seemed necessary when
      first running tests with uffd; it actually is not necessary
    * auto-detect if build-system provides linux/userfaultfd.h
      header.
    * simplify unix domain socket setup and communication.
    * use --address to specify the location of the used
      unix domain socket.

v5:
    * split the userfaultfd patch in multiple smaller patches
    * introduced vma_can_be_lazy() function to check if a page
      can be handled by uffd
    * moved uffd related code from cr-restore.c to uffd.c
    * handle failure to register a memory page of the restored process
      with userfaultfd

v6:
    * get PID of to be restored process from the 'criu restore' process;
      first the PID is transferred and then the UFFD
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

57891afc

Try to include userfaultfd with criu (part 1) · e2268aa3

Adrian Reber authored Mar 15, 2016

This is a first try to include userfaultfd with criu. Right now it
still requires a "normal" checkpoint. After checkpointing the
application it can be restored with the help of userfaultfd.

All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as
being handled by userfaultfd.

As soon as the process is restored it blocks on the first memory access
and waits for pages being transferred by userfaultfd.

To handle the required pages a new criu command has been added. For a
userfaultfd supported restore the first step is to start the
'lazy-pages' server:

  criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket

This is part 1 of the userfaultfd integration which provides the
'lazy-pages' server implementation.

v2:
    * provide option '--lazy-pages' to enable uffd style restore
    * use send_fd()/recv_fd() provided by criu (instead of own
      implementation)
    * do not install the uffd as service_fd
    * use named constants for MAP_ANONYMOUS
    * do not restore memory pages and then later mark them as uffd
      handled
    * remove function find_pages() to search in pages-<id>.img;
      now using criu functions to find the necessary pages;
      for each new page search the pages-<id>.img file is opened
    * only check the UFFDIO_API once
    * trying to protect uffd code by CONFIG_UFFD;
      use make UFFD=1 to compile criu with this patch

v3:
   * renamed the server mode from 'uffd' -> 'lazy-pages'
   * switched client and server roles transferring the UFFD FD
     * the criu part running in lazy-pages server mode is now
       waiting for connections
     * the criu restore process connects to the lazy-pages server
       to pass the UFFD FD
   * before UFFD copying anything else the VDSO pages are copied
     as it fails to copy unused VDSO pages once the process is running.
     this was necessary to be able to copy all pages.
   * if there are no more UFFD messages for 5 seconds the lazy-pages
     server switches in copy mode to copy all remaining pages, which
     have not been requested yet, into the restored process
   * check the UFFDIO_API at the correct place
   * close UFFD FD in the restorer to remove open UFFD FD in the
     restored process

v4:
    * removed unnecessary madvise() calls ; it seemed necessary when
      first running tests with uffd; it actually is not necessary
    * auto-detect if build-system provides linux/userfaultfd.h
      header
    * simplify unix domain socket setup and communication.
    * use --address to specify the location of the used
      unix domain socket

v5:
    * split the userfaultfd patch in multiple smaller patches
    * introduced vma_can_be_lazy() function to check if a page
      can be handled by uffd
    * moved uffd related code from cr-restore.c to uffd.c
    * handle failure to register a memory page of the restored process
      with userfaultfd

v6:
    * get PID of to be restored process from the 'criu restore' process;
      first the PID is transferred and then the UFFD
    * code has been re-ordered to be better prepared for lazy-restore
      from remote host
    * compile test for UFFD availability only once
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

e2268aa3

Remove static from prepare_task_entries function · 27e60179

Adrian Reber authored Mar 15, 2016

For the upcoming userfaultfd integration the lazy-pages mode needs to
setup the criu infrastructure to read the pages files.
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

27e60179

30 Aug, 2017 2 commits

ppc64le: travis: fixup Ubuntu repositories · 98ac646f

Mike Rapoport authored Aug 24, 2017

The ppc64le docker image has broken /etc/apt/sources.list. A small fixup to
it allows running ppc64le tests.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

98ac646f

test/ptrace_sig: wait children before calling test_daemon · 835e252b
Andrei Vagin authored Aug 12, 2017
```
A static test has to do nothing after test_daemon().
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
```
835e252b

21 Aug, 2017 1 commit

criu: Version 3.4 · a31c1854

Pavel Emelyanov authored Aug 21, 2017

The biggest new thing this time is s390x arch support!
Also we have several improvements and a set of bugfixes
as usual.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

a31c1854

17 Aug, 2017 1 commit
- Added badges to the title page · 84492ae2
  Pavel Emelyanov authored Aug 17, 2017
```
Travis statues for master and criu-dev and codacy grade.
```
  84492ae2
16 Aug, 2017 6 commits

restore: Fix deadlock when helper's child dies · 204c1ef9

Dmitry Safonov authored Jul 20, 2017

Since commit ced9c529 ("restore: fix race with helpers' kids dying
too early"), we block SIGCHLD in helper tasks before CR_STATE_RESTORE.
This way we avoided default criu sighandler as it doesn't expect that
childs may die.

This is very racy as we wait on futex for another stage to be started,
but the next stage may start only when all the tasks complete previous
stage. If some children of helper dies, the helper may already have
blocked SIGCHLD and have started sleeping on the futex. Then the next
stage never comes and no one reads a pending SIGCHLD for helper.

A customer met this situation on the node, where the following
(non-related) problem has occured:
Unable to send a fin packet: libnet_write_raw_ipv6(): -1 bytes written (Network is unreachable)
Then child criu of the helper has exited with error-code and the
lockup has happened.

While we could fix it by aborting futex in the end of
restore_task_with_children() for each (non-root also) tasks,
that would be not completely correct:
1. All futex-waiting tasks will wake up after that and they
may not expect that some tasks are on the previous stage,
so they will spam into logs with unrelated errors and may
also die painfully.
2. Child may die and miss aborting of the futex due to:
o segfault
o OOM killer
o User-sended SIGKILL
o Other error-path we forgot to cover with abort futex

To fix this deadlock in TASK_HELPER, as suggested-by Kirill,
let's check if there are children deaths expected - if there
isn't any, don't block SIGCHLD, otherwise wait() and check if
death was on expected stage of restore (not CR_STATE_RESTORE).
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

Conflicts:
criu/cr-restore.c

204c1ef9

util: Add block_sigmask/unblock_sigmask helpers · 8a270e58

Pavel Emelyanov authored Aug 16, 2017

This is an extract from Kirill Tkhai's patch
87464739 (restore: Block SIGCHLD during root_item initialization)
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

8a270e58

restorer: Report child's death reason correctly · 96602ba8

Dmitry Safonov authored Jun 05, 2017

E.g, if child was killed by SIGSEGV, this message
previously was "exited, status=11", as si_code == CLD_DUMPED == 3
in this case will result in (si_code & CLD_KILLED) == (si_code & 1).
Which is misleading as you may try to look for exit() calls
with 11 arg.
Correct if to compare si_code with CLD_*.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

96602ba8

images: netdev -- Adjust venet comment · 59c95839

Cyrill Gorcunov authored Mar 17, 2017

venet is not virtuozzo specific but rather
came from openvz, make it so.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

59c95839

page-xfer: handle a case when splice returns zero · 625552f0

Andrei Vagin authored Jul 20, 2017

A return value of 0 means end of input, so we need to
stop reading from this descriptor.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

625552f0

restore: handle errors of restore_wait_other_tasks · 89d10280

Andrei Vagin authored Jun 20, 2017

In a error case, task_entries->nr_in_progress is set to -1
and we have to handle this case.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

89d10280

15 Aug, 2017 3 commits

compel,s390: Fix setting regs for tasks · 181c44aa

Pavel Emelyanov authored Aug 15, 2017

Item's thread struct pid is not a pointer in master.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

181c44aa

util: report a command name in error messages for cr_system() · e2e1dbf4

Andrei Vagin authored Aug 09, 2017

It is good to know what command were executed.

https://github.com/xemul/criu/issues/371Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

e2e1dbf4

page-server: Tune up the encode/decode helpers · 943df7d6

Pavel Emelyanov authored Jul 14, 2017

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

943df7d6