Try to include userfaultfd with criu (part 2)
This is a first try to include userfaultfd with criu. Right now it
still requires a "normal" checkpoint. After checkpointing the
application it can be restored with the help of userfaultfd.
All restored pages with MAP_ANONYMOUS and MAP_PRIVATE set are marked as
being handled by userfaultfd.
As soon as the process is restored it blocks on the first memory access
and waits for pages being transferred by userfaultfd.
To handle the required pages a new criu command has been added. For a
userfaultfd supported restore the first step is to start the
'lazy-pages' server:
criu lazy-pages -v4 -D /tmp/3/ --address /tmp/userfault.socket
This waits on a unix domain socket (defined using the --address option)
to receive a userfaultfd file descriptor from a '--lazy-pages' enabled
'criu restore':
criu restore -D /tmp/3 -j -v4 --lazy-pages \
--address /tmp/userfault.socket
In the first step the VDSO pages are pushed from the lazy-pages server
into the restored process. After that the lazy-pages server waits on the
UFFD FD for a UFFD requested page. If there are no requests received
during a period of 5 seconds the lazy-pages server switches into a mode
where the remaining, non-transferred pages are copied into the
destination process. After all remaining pages have been copied the
lazy-pages server exits.
The first page that usually is requested is a VDSO page. The process
currently used for restoring has two VDSO pages, but only one is
requested
via userfaultfd. In the second part where the remaining pages are copied
into the process, the second VDSO page is also copied into the process
as it has not been requested previously. Unfortunately, even as this
page has not been requested before, it is not accepted by userfaultfd.
EINVAL is returned. The reason for EINVAL is not understood and
therefore
the VDSO pages are copied first into the process, then switching to
request
mode and copying the pages which are requested via userfaultfd. To
decide at which point the VDSO pages can be copied into the process, the
lazy-pages server is currently waiting for the first page requested via
userfaultfd. This is one of the VDSO pages. To not copy a page a second
time, which is unnecessary and not possible, there is now a check to see
if the page has been transferred previously.
The use case to use usefaultfd with a checkpointed process on a remote
machine will probably benefit from the current work related to
image-cache and image-proxy.
For the final implementation it would be nice to have a restore running
in uffd mode on one system which requests the memory pages over the
network from another system which is running 'criu checkpoint' also in
uffd mode. This way the pages need to be copied only 'once' from the
checkpoint process to the uffd restore process.
TODO:
* Contains still many debug outputs which need to be cleaned up.
* Maybe transfer the dump directory FD also via unix domain sockets
so that the 'uffd'/'lazy-pages' server can keep running without
the need to specify the dump directory with '-D'
* Keep the lazy-pages server running after all pages have been
transferred and start waiting for new connections to serve.
* Resurrect the non-cooperative patch set, as once the restored task
fork()'s or calls mremap() the whole thing becomes broken.
* Figure out if current VDSO handling is correct.
* Figure out when and how zero pages need to be inserted via uffd.
v2:
* provide option '--lazy-pages' to enable uffd style restore
* use send_fd()/recv_fd() provided by criu (instead of own
implementation)
* do not install the uffd as service_fd
* use named constants for MAP_ANONYMOUS
* do not restore memory pages and then later mark them as uffd
handled
* remove function find_pages() to search in pages-<id>.img;
now using criu functions to find the necessary pages;
for each new page search the pages-<id>.img file is opened
* only check the UFFDIO_API once
* trying to protect uffd code by CONFIG_UFFD;
use make UFFD=1 to compile criu with this patch
v3:
* renamed the server mode from 'uffd' -> 'lazy-pages'
* switched client and server roles transferring the UFFD FD
* the criu part running in lazy-pages server mode is now
waiting for connections
* the criu restore process connects to the lazy-pages server
to pass the UFFD FD
* before UFFD copying anything else the VDSO pages are copied
as it fails to copy unused VDSO pages once the process is running.
this was necessary to be able to copy all pages.
* if there are no more UFFD messages for 5 seconds the lazy-pages
server switches in copy mode to copy all remaining pages, which
have not been requested yet, into the restored process
* check the UFFDIO_API at the correct place
* close UFFD FD in the restorer to remove open UFFD FD in the
restored process
v4:
* removed unnecessary madvise() calls ; it seemed necessary when
first running tests with uffd; it actually is not necessary
* auto-detect if build-system provides linux/userfaultfd.h
header.
* simplify unix domain socket setup and communication.
* use --address to specify the location of the used
unix domain socket.
v5:
* split the userfaultfd patch in multiple smaller patches
* introduced vma_can_be_lazy() function to check if a page
can be handled by uffd
* moved uffd related code from cr-restore.c to uffd.c
* handle failure to register a memory page of the restored process
with userfaultfd
v6:
* get PID of to be restored process from the 'criu restore' process;
first the PID is transferred and then the UFFD
Signed-off-by:
Adrian Reber <areber@redhat.com>
Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
Showing
Please
register
or
sign in
to comment