- 16 Sep, 2017 27 commits
-
-
Andrew Vagin authored
Add linux/userfaultfd.h to criu sources. This header is a part of the kernel API and I see nothing wrong to have in the repo. Why we want to do this: * to check that criu works correctly if a kernel doesn't support userfaultfd. * to check compilation of the userfaultfd part in travis-ci. v2: remove UFFD from FEATURES_LIST Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by:
Adrian Reber <areber@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Adrian Reber <areber@redhat.com> Signed-off-by:
Andrew Vagin <avagin@virtuozzo.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
for holding state related to userfaultfd handling Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
It verifies that amount of collected and transferred pages is consitent and prints a summary Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
It will handle page fault notifications from userfaultfd Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
so that listenning file descriptor might be used in select/poll Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
If CONFIG_HAS_UFFD is not defined an attempt to run the lazy pages daemon will result in error message Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
The __u64 is 'unsigned long' on Power and 'unsigned long long' on x86_64. Using PRI?64 does not help because, for instance, PRIu64 is 'lu'. According to [1] the solution is to define __SANE_USERSPACE_TYPES__ for Power builds [1] http://thread.gmane.org/gmane.linux.kernel/1425475/focus=1427433Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
When get_page returns 0, it means that a page is mapped by a vma but it is not found in the pagemap. This happens when a page is a zero page and threofre skipped by dump. Use UFFDIO_ZEROMAP to create a zero page in the restored process address space. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
so that it'll be able to handle both UFFDIO_COPY and UFFDIO_ZEROPAGE Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
The uffd_copied_pages can be incremented in uffd_copy_page function rather than in its callers Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
Most of the UFFD logic was in the function uffd_listen() which was directly called from crtools.c. In preparation for the remote lazy restore most of the code has been moved to separate function for better integration of the network functionality. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
To better track how many pages have been handled by UFFD a few variables have been made static global to easier access them and to reduce the number of parameters passed around. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
uffd_listen() is a rather large function and this starts to move code into subfunctions. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
The variable vma_size was used for early debugging of lazy restore and has no significance now. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Since VDSO pages cannot be lazy, no need to take care of them in lazy-pages daemon. Signed-off-by:
Mike Rapoport <rapoport@il.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
VDSO is just a few pages and they can be loaded directly rather than go through userfaultfd to save some complexity on the lazy-pages daemon side. Signed-off-by:
Mike Rapoport <rapoport@il.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
For a lazy restore via userfaultfd the lazy-pages daemon needs to know which pages exist, so that it knows when all pages have finally been migrated so that the restored process has all of its memory. Therefore it needs to know which pages exist and it needs to parse the files in the dump result directory. The existing criu functions are designed to be used by a 'normal' restore and thus a lot of assumptions are made what has to be set up. For the lazy-pages restore the complete 'restore' initialization is not necessary and therefore the criu common code dependencies are minimized with this commit. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
This is a first try to include userfaultfd with criu. Right now it still requires a "normal" checkpoint. After checkpointing the application it can be restored with the help of userfaultfd. All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as being handled by userfaultfd. As soon as the process is restored it blocks on the first memory access and waits for pages being transferred by userfaultfd. To handle the required pages a new criu command has been added. For a userfaultfd supported restore the first step is to start the 'lazy-pages' server: criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket This waits on a unix domain socket (defined using the --address option) to receive a userfaultfd file descriptor from a '--lazy-pages' enabled 'criu restore': criu restore -D /tmp/3 -j -v4 --lazy-pages \ --address /tmp/userfault.socket In the first step the VDSO pages are pushed from the lazy-pages server into the restored process. After that the lazy-pages server waits on the UFFD FD for a UFFD requested page. If there are no requests received during a period of 5 seconds the lazy-pages server switches into a mode where the remaining, non-transferred pages are copied into the destination process. After all remaining pages have been copied the lazy-pages server exits. The first page that usually is requested is a VDSO page. The process currently used for restoring has two VDSO pages, but only one is requested via userfaultfd. In the second part where the remaining pages are copied into the process, the second VDSO page is also copied into the process as it has not been requested previously. Unfortunately, even as this page has not been requested before, it is not accepted by userfaultfd. EINVAL is returned. The reason for EINVAL is not understood and therefore the VDSO pages are copied first into the process, then switching to request mode and copying the pages which are requested via userfaultfd. To decide at which point the VDSO pages can be copied into the process, the lazy-pages server is currently waiting for the first page requested via userfaultfd. This is one of the VDSO pages. To not copy a page a second time, which is unnecessary and not possible, there is now a check to see if the page has been transferred previously. The use case to use usefaultfd with a checkpointed process on a remote machine will probably benefit from the current work related to image-cache and image-proxy. For the final implementation it would be nice to have a restore running in uffd mode on one system which requests the memory pages over the network from another system which is running 'criu checkpoint' also in uffd mode. This way the pages need to be copied only 'once' from the checkpoint process to the uffd restore process. TODO: * Contains still many debug outputs which need to be cleaned up. * Maybe transfer the dump directory FD also via unix domain sockets so that the 'uffd'/'lazy-pages' server can keep running without the need to specify the dump directory with '-D' * Keep the lazy-pages server running after all pages have been transferred and start waiting for new connections to serve. * Resurrect the non-cooperative patch set, as once the restored task fork()'s or calls mremap() the whole thing becomes broken. * Figure out if current VDSO handling is correct. * Figure out when and how zero pages need to be inserted via uffd. v2: * provide option '--lazy-pages' to enable uffd style restore * use send_fd()/recv_fd() provided by criu (instead of own implementation) * do not install the uffd as service_fd * use named constants for MAP_ANONYMOUS * do not restore memory pages and then later mark them as uffd handled * remove function find_pages() to search in pages-<id>.img; now using criu functions to find the necessary pages; for each new page search the pages-<id>.img file is opened * only check the UFFDIO_API once * trying to protect uffd code by CONFIG_UFFD; use make UFFD=1 to compile criu with this patch v3: * renamed the server mode from 'uffd' -> 'lazy-pages' * switched client and server roles transferring the UFFD FD * the criu part running in lazy-pages server mode is now waiting for connections * the criu restore process connects to the lazy-pages server to pass the UFFD FD * before UFFD copying anything else the VDSO pages are copied as it fails to copy unused VDSO pages once the process is running. this was necessary to be able to copy all pages. * if there are no more UFFD messages for 5 seconds the lazy-pages server switches in copy mode to copy all remaining pages, which have not been requested yet, into the restored process * check the UFFDIO_API at the correct place * close UFFD FD in the restorer to remove open UFFD FD in the restored process v4: * removed unnecessary madvise() calls ; it seemed necessary when first running tests with uffd; it actually is not necessary * auto-detect if build-system provides linux/userfaultfd.h header. * simplify unix domain socket setup and communication. * use --address to specify the location of the used unix domain socket. v5: * split the userfaultfd patch in multiple smaller patches * introduced vma_can_be_lazy() function to check if a page can be handled by uffd * moved uffd related code from cr-restore.c to uffd.c * handle failure to register a memory page of the restored process with userfaultfd v6: * get PID of to be restored process from the 'criu restore' process; first the PID is transferred and then the UFFD Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
This is a first try to include userfaultfd with criu. Right now it still requires a "normal" checkpoint. After checkpointing the application it can be restored with the help of userfaultfd. All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as being handled by userfaultfd. As soon as the process is restored it blocks on the first memory access and waits for pages being transferred by userfaultfd. To handle the required pages a new criu command has been added. For a userfaultfd supported restore the first step is to start the 'lazy-pages' server: criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket This is part 1 of the userfaultfd integration which provides the 'lazy-pages' server implementation. v2: * provide option '--lazy-pages' to enable uffd style restore * use send_fd()/recv_fd() provided by criu (instead of own implementation) * do not install the uffd as service_fd * use named constants for MAP_ANONYMOUS * do not restore memory pages and then later mark them as uffd handled * remove function find_pages() to search in pages-<id>.img; now using criu functions to find the necessary pages; for each new page search the pages-<id>.img file is opened * only check the UFFDIO_API once * trying to protect uffd code by CONFIG_UFFD; use make UFFD=1 to compile criu with this patch v3: * renamed the server mode from 'uffd' -> 'lazy-pages' * switched client and server roles transferring the UFFD FD * the criu part running in lazy-pages server mode is now waiting for connections * the criu restore process connects to the lazy-pages server to pass the UFFD FD * before UFFD copying anything else the VDSO pages are copied as it fails to copy unused VDSO pages once the process is running. this was necessary to be able to copy all pages. * if there are no more UFFD messages for 5 seconds the lazy-pages server switches in copy mode to copy all remaining pages, which have not been requested yet, into the restored process * check the UFFDIO_API at the correct place * close UFFD FD in the restorer to remove open UFFD FD in the restored process v4: * removed unnecessary madvise() calls ; it seemed necessary when first running tests with uffd; it actually is not necessary * auto-detect if build-system provides linux/userfaultfd.h header * simplify unix domain socket setup and communication. * use --address to specify the location of the used unix domain socket v5: * split the userfaultfd patch in multiple smaller patches * introduced vma_can_be_lazy() function to check if a page can be handled by uffd * moved uffd related code from cr-restore.c to uffd.c * handle failure to register a memory page of the restored process with userfaultfd v6: * get PID of to be restored process from the 'criu restore' process; first the PID is transferred and then the UFFD * code has been re-ordered to be better prepared for lazy-restore from remote host * compile test for UFFD availability only once Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
For the upcoming userfaultfd integration the lazy-pages mode needs to setup the criu infrastructure to read the pages files. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
- 30 Aug, 2017 2 commits
-
-
Mike Rapoport authored
The ppc64le docker image has broken /etc/apt/sources.list. A small fixup to it allows running ppc64le tests. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
A static test has to do nothing after test_daemon(). Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 21 Aug, 2017 1 commit
-
-
Pavel Emelyanov authored
The biggest new thing this time is s390x arch support! Also we have several improvements and a set of bugfixes as usual. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
- 17 Aug, 2017 1 commit
-
-
Pavel Emelyanov authored
Travis statues for master and criu-dev and codacy grade.
-
- 16 Aug, 2017 6 commits
-
-
Dmitry Safonov authored
Since commit ced9c529 ("restore: fix race with helpers' kids dying too early"), we block SIGCHLD in helper tasks before CR_STATE_RESTORE. This way we avoided default criu sighandler as it doesn't expect that childs may die. This is very racy as we wait on futex for another stage to be started, but the next stage may start only when all the tasks complete previous stage. If some children of helper dies, the helper may already have blocked SIGCHLD and have started sleeping on the futex. Then the next stage never comes and no one reads a pending SIGCHLD for helper. A customer met this situation on the node, where the following (non-related) problem has occured: Unable to send a fin packet: libnet_write_raw_ipv6(): -1 bytes written (Network is unreachable) Then child criu of the helper has exited with error-code and the lockup has happened. While we could fix it by aborting futex in the end of restore_task_with_children() for each (non-root also) tasks, that would be not completely correct: 1. All futex-waiting tasks will wake up after that and they may not expect that some tasks are on the previous stage, so they will spam into logs with unrelated errors and may also die painfully. 2. Child may die and miss aborting of the futex due to: o segfault o OOM killer o User-sended SIGKILL o Other error-path we forgot to cover with abort futex To fix this deadlock in TASK_HELPER, as suggested-by Kirill, let's check if there are children deaths expected - if there isn't any, don't block SIGCHLD, otherwise wait() and check if death was on expected stage of restore (not CR_STATE_RESTORE). Reviewed-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com> Conflicts: criu/cr-restore.c
-
Pavel Emelyanov authored
This is an extract from Kirill Tkhai's patch 87464739 (restore: Block SIGCHLD during root_item initialization) Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Dmitry Safonov authored
E.g, if child was killed by SIGSEGV, this message previously was "exited, status=11", as si_code == CLD_DUMPED == 3 in this case will result in (si_code & CLD_KILLED) == (si_code & 1). Which is misleading as you may try to look for exit() calls with 11 arg. Correct if to compare si_code with CLD_*. Signed-off-by:
Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
venet is not virtuozzo specific but rather came from openvz, make it so. Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
A return value of 0 means end of input, so we need to stop reading from this descriptor. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
In a error case, task_entries->nr_in_progress is set to -1 and we have to handle this case. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 15 Aug, 2017 3 commits
-
-
Pavel Emelyanov authored
Item's thread struct pid is not a pointer in master. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Andrei Vagin authored
It is good to know what command were executed. https://github.com/xemul/criu/issues/371Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-