- 16 Sep, 2017 40 commits
-
-
Pavel Emelyanov authored
The _copy and _update_lazy_iovecs are both called by hands once the data is ready. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
This flag means, that the PR_ASYNC is valid, but the IO should be started ASAP. This is how remote reader works, so this flag is mostly for the local reader. It will let us unify page-fault handlers for local and remote cases. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
We already have routines that do send-req, recv-info and recv-page, so no need in yet another one. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
All the "lower" page-read-s should have already arrived with pre-dump. This fixes the combined scheme. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Mike Rapoport authored
The page transfer protocol is completely synchronous on the dump side, therefore we can presume that when we get POLLIN event on the page server socket it is either page info response for the last sent page request or the page data following the last page info. In the first case we set ev_data associated with page server socket events to values received in receive_remote_page_info and in the second case we reset ev_data to zero. This allows us to distinguish what was the reason page_server_event have been called. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
The synchronous remote page transfer prevents reception of uffd events during the communications with the page server on the dump side. Adding socket file descriptor to epoll_wait allows processing of incoming uffd events after non-blocking request for remote page is issued and before the dump side page server replies. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
The asynchronous version of remote page_read will send the request to the dump side and return happily. The response will be handled by the uffd.c because it's epoll loop is the only place where we can handle events. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
This part of code is responsible for reseting pagemap to proper locatation, and mapping requested address to zero pfn if needed. The upcoming addtions to uffd.c will reuse this code. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
For asynchrounous page transfers in post-copy migration we need to be able to request a remote pages, receive back information about the data is going to arrive and receive the page data itself. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
It will used by lazy-pages daemon to enable polling for reception of page data from remote dump travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
In early days of uffd.c return value from uffd_copy was used to count transferred pages. Since this is not the case anymore we can use 0 as success. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Currently lazy-pages daemon uses either pr->read_pages or get_remote_pages to get actual page data from local images or remote server. From now on, page_read will be completely responsible for getting the page data. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Currently we allocate a single page to use as intermediate buffer for holding data that will be used in UFFDIO_COPY. Let's allocate a buffer per process and make that buffer large enough to hold the largest continuos chunk. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
page_read->seek_page was restored to skip zero pagemaps, therefore we should check its return value rather than underlying PME. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Inline relevant parts of get_page inside uffd_handle_page and call uffd_{copy,zero}_page after we've got the data. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
We will want to poll not only a bunch of uffd-s, but also the lazy socket, so here's "an fd and a callback" object to be pushed into epoll. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Mike Rapoport authored
Instead of tracking memory handled by userfaultfd on the page basis we can use IOVs for continious chunks. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
Right now the zdtm.py hacks around core code and waits for a second for the socket to appear. Let's better make proper --daemon mode for lazy-pages daemon and pidfile generation. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Mike Rapoport authored
Instead of creating mm-related parts of restore info in process tree we can directly use MmEntry for VMA traversals. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Moving the find_vmas and collect_uffd_pages functions before they are actually used. This allows to drop forward declaration of find_vmas and will make subsequent refactoring cleaner. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
The event received should be checked to be #PF before accessing its other arguments. [ Mike: Well, looking forward to see non-cooperative userfaultfd patches in kernel we should have something like static int handle_uffd_enent(struct lazy_pages_info *lpi) { read(&msg...); switch (msg.event) { case UFFD_EVENT_PAGEFAULT: handle_pagefault(lpi, msg); break; default: return -1; } } But since this patch is anyway is a bugfix: <ack> ] travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
After previous patch we no longer need this hash since we don't need fd -> lpi conversion. travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
This helps us get lpi MUCH faster on #PF. travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
This avoids excessive memcpy() one instruction below. travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Kir Kolyshkin authored
In cases errno is being set, we need to use pr_perror() to print it. In cases errno is not set, we should use pr_err(). pr_perror() doesn't need a colon or a newline. pr_err() needs a newline. Cc: Adrian Reber <areber@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> travis-ci: success for Assorted nitpicks Signed-off-by:
Kir Kolyshkin <kir@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Use relative path for UNIX socket instead of absolute one. This ensures we won't run into problems with invalid socket names. travis-ci: success for lazy-pages: use relative path for UNIX socket Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Restore of a zombie process does not call setup_uffd which causes lazy-pages daemon to stuck forever waiting for (pid, uffd) pair to arrive. Let's extend the protocol between restore and lazy-pages so that for zombie process a (0, -1) pair will be sent instead of actual (uffd, pid). travis-ci: success for lazy-pages: misc fixes (rev4) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for lazy-pages: misc fixes (rev4) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
To properly handle zombie processes we will need to distinguish failures coming from socket communications from absent userfault file descriptor travis-ci: success for lazy-pages: misc fixes (rev4) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
When a VMA is mapped with MAP_LOCKED it is address space is populated with pages which causes UFFDIO_COPY to return -EXISTS. Until we can find some better solution let's avoid marking VMAs with MAP_LOCKED as lazy. Fixes: #238 travis-ci: success for lazy-pages: misc fixes (rev3) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
# criu dump --display-stats -D /tmp/cp -t <PID> Displaying dump stats: ... Zero memory pages: 0 (0x0) Lazy memory pages: 0 (0x0) travis-ci: success for Added option to display dump/restore stats (rev2) Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
Only the UFFD daemon is aware if pages are in the parent or not. The restore will continue to work as any lazy-restore except that pages from parent checkpoints will be pre-populated by the restorer. The restorer will still register the whole memory region as being handled by userfaultfd even if it contains pages from parent checkpoints. Userfaultfd page faults will only happen on pages which contain no data. This means from the parent pre-populated pages will not trigger a userfaultfd message even if marked as being handled by userfaultfd. The UFFD daemon knows about pages which are available in the parent checkpoints and will not push those pages unnecessarily to userfaultfd. Following steps to migrate a process are now possible: Source system: * criu pre-dump -D /tmp/cp/1 -t <PID> * rsync -a /tmp/cp <destination>:/tmp * criu dump -D /tmp/cp/2 -t <PID> --port 27 --lazy-pages \ --prev-images-dir ../1/ --track-mem Destination system: * rsync -a <source>:/tmp/cp /tmp/ * criu lazy-pages --page-server --address <source> --port 27 \ -D /tmp/cp/2 & * criu restore --lazy-pages -D /tmp/cp/2 This will now restore all pages from the parent checkpoint if they are not marked as lazy in the second checkpoint. v2: - changed parent detection to use pagemap_in_parent() v3: - unfortunately this reverts c11cf95afbe023a2816a3afaecb65cc4fee670d7 "criu: mem: skip lazy pages during restore based on pagemap info" To be able to split the VMA-s in the right chunks for the restorer it is necessary to make the decision lazy or not on the VmaEntry level. v4: - everything has changed thanks to Mike Rapoport's suggestion - the VMA-s are no longer touched or split - instead of over 100 lines of changes this is now two line patch Signed-off-by:
Adrian Reber <areber@redhat.com> Acked-by:
Mike Rapoprot <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
Combining pre-copy (pre-dump) and post-copy (lazy-pages) mode showed a problem in the function page_pipe_split_ppb(). The function is used to split the page-pipe-buffer so that it only contains the IOVs request from the restore side during lazy restore. Unfortunately it only splits the leading IOVs out of the page-pipe-buffer and not the trailing: Before split for requested address 0x7f27284d1000: page-pipe: ppb->iov 0x7f0f74d93040 page-pipe: 0x7f27282bb000 1 page-pipe: 0x7f27284d1000 1 page-pipe: 0x7f27284dd000 2 After split: page-pipe: ppb->iov 0x7f0f74d93050 page-pipe: 0x7f27284d1000 1 page-pipe: 0x7f27284dd000 2 and: page-pipe: ppb->iov 0x7f0f74d93040 page-pipe: 0x7f27282bb000 1 This patch keeps on splitting the page-pipe-buffer until it contains only the requested address with the requested length. After split (still trying to load 0x7f27284d1000): page-pipe: ppb->iov 0x7f0f74d93050 page-pipe: 0x7f27284d1000 1 and: page-pipe: ppb->iov 0x7f0f74d93040 page-pipe: 0x7f27282bb000 1 and: page-pipe: ppb->iov 0x7f0f74d93060 page-pipe: 0x7f27284dd000 2 v2: - moved declarations to the declaration block Signed-off-by:
Adrian Reber <areber@redhat.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Currently potentially lazy pages are not counted as written even if they are dump into pages*img. Count these pages as "pages_written" when dump is not going to skip writing lazy pages to disk. travis-ci: success for criu: mem: count all pages actually written to image as "pages_written" Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-