- 16 Sep, 2017 40 commits
-
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Currently we allocate a single page to use as intermediate buffer for holding data that will be used in UFFDIO_COPY. Let's allocate a buffer per process and make that buffer large enough to hold the largest continuos chunk. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
page_read->seek_page was restored to skip zero pagemaps, therefore we should check its return value rather than underlying PME. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Inline relevant parts of get_page inside uffd_handle_page and call uffd_{copy,zero}_page after we've got the data. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
We will want to poll not only a bunch of uffd-s, but also the lazy socket, so here's "an fd and a callback" object to be pushed into epoll. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Mike Rapoport authored
Instead of tracking memory handled by userfaultfd on the page basis we can use IOVs for continious chunks. travis-ci: success for uffd: A new set of improvements Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
Right now the zdtm.py hacks around core code and waits for a second for the socket to appear. Let's better make proper --daemon mode for lazy-pages daemon and pidfile generation. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Mike Rapoport authored
Instead of creating mm-related parts of restore info in process tree we can directly use MmEntry for VMA traversals. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Moving the find_vmas and collect_uffd_pages functions before they are actually used. This allows to drop forward declaration of find_vmas and will make subsequent refactoring cleaner. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
The event received should be checked to be #PF before accessing its other arguments. [ Mike: Well, looking forward to see non-cooperative userfaultfd patches in kernel we should have something like static int handle_uffd_enent(struct lazy_pages_info *lpi) { read(&msg...); switch (msg.event) { case UFFD_EVENT_PAGEFAULT: handle_pagefault(lpi, msg); break; default: return -1; } } But since this patch is anyway is a bugfix: <ack> ] travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
After previous patch we no longer need this hash since we don't need fd -> lpi conversion. travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
This helps us get lpi MUCH faster on #PF. travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
This avoids excessive memcpy() one instruction below. travis-ci: success for uffd: A set of improvements over criu/uffd.c Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Kir Kolyshkin authored
In cases errno is being set, we need to use pr_perror() to print it. In cases errno is not set, we should use pr_err(). pr_perror() doesn't need a colon or a newline. pr_err() needs a newline. Cc: Adrian Reber <areber@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> travis-ci: success for Assorted nitpicks Signed-off-by:
Kir Kolyshkin <kir@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Use relative path for UNIX socket instead of absolute one. This ensures we won't run into problems with invalid socket names. travis-ci: success for lazy-pages: use relative path for UNIX socket Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Restore of a zombie process does not call setup_uffd which causes lazy-pages daemon to stuck forever waiting for (pid, uffd) pair to arrive. Let's extend the protocol between restore and lazy-pages so that for zombie process a (0, -1) pair will be sent instead of actual (uffd, pid). travis-ci: success for lazy-pages: misc fixes (rev4) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
travis-ci: success for lazy-pages: misc fixes (rev4) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
To properly handle zombie processes we will need to distinguish failures coming from socket communications from absent userfault file descriptor travis-ci: success for lazy-pages: misc fixes (rev4) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
When a VMA is mapped with MAP_LOCKED it is address space is populated with pages which causes UFFDIO_COPY to return -EXISTS. Until we can find some better solution let's avoid marking VMAs with MAP_LOCKED as lazy. Fixes: #238 travis-ci: success for lazy-pages: misc fixes (rev3) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
# criu dump --display-stats -D /tmp/cp -t <PID> Displaying dump stats: ... Zero memory pages: 0 (0x0) Lazy memory pages: 0 (0x0) travis-ci: success for Added option to display dump/restore stats (rev2) Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
Only the UFFD daemon is aware if pages are in the parent or not. The restore will continue to work as any lazy-restore except that pages from parent checkpoints will be pre-populated by the restorer. The restorer will still register the whole memory region as being handled by userfaultfd even if it contains pages from parent checkpoints. Userfaultfd page faults will only happen on pages which contain no data. This means from the parent pre-populated pages will not trigger a userfaultfd message even if marked as being handled by userfaultfd. The UFFD daemon knows about pages which are available in the parent checkpoints and will not push those pages unnecessarily to userfaultfd. Following steps to migrate a process are now possible: Source system: * criu pre-dump -D /tmp/cp/1 -t <PID> * rsync -a /tmp/cp <destination>:/tmp * criu dump -D /tmp/cp/2 -t <PID> --port 27 --lazy-pages \ --prev-images-dir ../1/ --track-mem Destination system: * rsync -a <source>:/tmp/cp /tmp/ * criu lazy-pages --page-server --address <source> --port 27 \ -D /tmp/cp/2 & * criu restore --lazy-pages -D /tmp/cp/2 This will now restore all pages from the parent checkpoint if they are not marked as lazy in the second checkpoint. v2: - changed parent detection to use pagemap_in_parent() v3: - unfortunately this reverts c11cf95afbe023a2816a3afaecb65cc4fee670d7 "criu: mem: skip lazy pages during restore based on pagemap info" To be able to split the VMA-s in the right chunks for the restorer it is necessary to make the decision lazy or not on the VmaEntry level. v4: - everything has changed thanks to Mike Rapoport's suggestion - the VMA-s are no longer touched or split - instead of over 100 lines of changes this is now two line patch Signed-off-by:
Adrian Reber <areber@redhat.com> Acked-by:
Mike Rapoprot <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
Combining pre-copy (pre-dump) and post-copy (lazy-pages) mode showed a problem in the function page_pipe_split_ppb(). The function is used to split the page-pipe-buffer so that it only contains the IOVs request from the restore side during lazy restore. Unfortunately it only splits the leading IOVs out of the page-pipe-buffer and not the trailing: Before split for requested address 0x7f27284d1000: page-pipe: ppb->iov 0x7f0f74d93040 page-pipe: 0x7f27282bb000 1 page-pipe: 0x7f27284d1000 1 page-pipe: 0x7f27284dd000 2 After split: page-pipe: ppb->iov 0x7f0f74d93050 page-pipe: 0x7f27284d1000 1 page-pipe: 0x7f27284dd000 2 and: page-pipe: ppb->iov 0x7f0f74d93040 page-pipe: 0x7f27282bb000 1 This patch keeps on splitting the page-pipe-buffer until it contains only the requested address with the requested length. After split (still trying to load 0x7f27284d1000): page-pipe: ppb->iov 0x7f0f74d93050 page-pipe: 0x7f27284d1000 1 and: page-pipe: ppb->iov 0x7f0f74d93040 page-pipe: 0x7f27282bb000 1 and: page-pipe: ppb->iov 0x7f0f74d93060 page-pipe: 0x7f27284dd000 2 v2: - moved declarations to the declaration block Signed-off-by:
Adrian Reber <areber@redhat.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Currently potentially lazy pages are not counted as written even if they are dump into pages*img. Count these pages as "pages_written" when dump is not going to skip writing lazy pages to disk. travis-ci: success for criu: mem: count all pages actually written to image as "pages_written" Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Adrian Reber authored
This translates pagemap flags into strings for easier readability. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Pavel Emelyanov authored
Do the same here, the flags is now enough to tell hole from pagemap. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
They are now the same and PE_PRESENT bit helps us distinguish holes from pagemaps having pages inside. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Pavel Emelyanov authored
The ->write_hole and ->write_pagemap now look very much alike, so let's merge them. This is preparatory patch that makes holy type decision based on PE_FOO flags. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com>
-
Mike Rapoport authored
Instead of checking whether the VMA containing a page can be lazy for each page, skip the entire parts of pagemap that have PE_LAZY flag set. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
The PE_PRESENT flags is always set for pagemap entries that have corresponding pages in the pages*img. Pagemap entries describing a hole either with zero page or with pages in the parent snapshot will no have PE_PRESENT flag set. Pagemap entry that may be lazily restored is a special case. For the lazy restore from disk case, both PE_LAZY and PE_PRESENT will be set in the pagemap, but for the remote lazy pages case only PE_LAZY will be set. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
With 'zero' and 'lazy' booleans replaced by the flags field in PagemapEntry, it is required that page-xfer will be aware of the change. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Having three booleans in pagemap entry clues for usage of good old flags. Replace 'zero' and 'lazy' booleans with flags and use flags for internal tracking of in_parent value. Eventually, in_parent may be deprecated. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Create the socket early so that it will be available after restoring the namespaces Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Mike Rapoport authored
Very minimalistic at the moment, no remote pages and namesapces. Still better than nothing :) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Andrew Vagin authored
Cc: Adrian Reber <areber@redhat.com> Signed-off-by:
Andrew Vagin <avagin@virtuozzo.com> Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Eugene Batalov authored
We'll use it in anon shmem dedup so we need to have access to it in shmem.c Signed-off-by:
Eugene Batalov <eabatalov89@gmail.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-