- 30 Oct, 2018 40 commits
-
-
Pavel Tikhomirov authored
These also fixes false-propagation problem of the mount to itself if it is in parent's share. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Pavel Tikhomirov authored
1) redo waiting for parents of propagation group to be mounted using pre-found propagation groups 2) for shared mount wait for children of that shared group which has no propagation in our shared mount (2) - effectively is a support of non-uniform shares, that means two mounts of shared group can have different sets of children now - we will mount them in the right order, but propagate_mount and validate_shared are still preventing c/r-ing such shares, will fix the former and remove the latter in separate(next) patches. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Pavel Tikhomirov authored
These information will help improving the restore of tricky mounts configurations. Function same_propagation_group checks if two mounts were created simultaneousely through shared mount propagation, and the main part of these - they should be in exaclty the same place inside the share of their parents. Function root_path_from_parent prints the mountpoint path relative to the root of the parent's share, by first substracting parent's mountpoint from our mountpoint and second prepending parents root path (relative to the root of it's file system), e.g: id parent_id root mountpoint 1 0 / / 2 1 / /parent_a 3 1 /dir /parent_b 4 2 / /parent_a/dir/a 5 3 / /parent_b/a (Let 2 and 3 be a shared group) For mount 4 root_path_from_parent gives: "/parent_a/dir/a" - "/parent_a" == "/dir/a" "/" + "/dir/a" == "/dir/a" For mount 5: "/parent_b/a" - "/parent_b" == "/a" "/dir" + "/a" == "/dir/a" So mounts 4 and 5 are a propagation group. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Pavel Tikhomirov authored
495 494 0:62 / /zdtm/static/shared_slave_mount_children.test/share rw,relatime shared:235 - tmpfs share rw 496 494 0:62 / /zdtm/static/shared_slave_mount_children.test/slave1 rw,relatime shared:236 master:235 - tmpfs share rw 497 494 0:62 / /zdtm/static/shared_slave_mount_children.test/slave2 rw,relatime shared:236 master:235 - tmpfs share rw 498 496 0:63 / /zdtm/static/shared_slave_mount_children.test/slave1/child rw,relatime shared:237 - tmpfs child rw 499 497 0:63 / /zdtm/static/shared_slave_mount_children.test/slave2/child rw,relatime shared:237 - tmpfs child rw Before the fix we had: (00.167574) 1: Error (criu/mount.c:1769): mnt: A few mount points can't be mounted (00.167577) 1: Error (criu/mount.c:1773): mnt: 498:496 / /tmp/.criu.mntns.o2Op5j/9-0000000000/zdtm/static/shared_slave_mount_children.test/slave1/child child (00.167580) 1: Error (criu/mount.c:1773): mnt: 497:494 / /tmp/.criu.mntns.o2Op5j/9-0000000000/zdtm/static/shared_slave_mount_children.test/slave2 share Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Pavel Tikhomirov authored
We should not use ->bind link for checking master's children. As if we have two slaves shared between each other, the one mounted first will replace ->bind link for the other - that will break restore. Also while on it, if we do not want doubled mounts and want to prohibit propagation to slaves on restore we likely want all children of the whole master's share mounted before slave. JFYI: Actually these restriction is very strict and some cases will fail to restore, for instance (hope nobody does so): mkdir /test mount -t tmpfs test /test mount --make-private /test mkdir /test/{share,slave} mount -t tmpfs share /test/share --make-shared mount --bind /test/share/ /test/slave/ mount --make-slave /test/slave mount --make-shared /test/slave mkdir /test/share/slave mount --bind /test/slave/ /test/share/slave/ cat /proc/self/mountinfo | grep test 524 612 0:69 / /test rw,relatime - tmpfs test rw 570 524 0:73 / /test/share rw,relatime shared:879 - tmpfs share rw 571 524 0:73 / /test/slave rw,relatime shared:942 master:879 - tmpfs share rw 602 570 0:73 / /test/share/slave rw,relatime shared:942 master:879 - tmpfs share rw 603 571 0:73 / /test/slave/slave rw,relatime shared:943 master:942 - tmpfs share rw Here 603 is a propagation of 602 from master 570 to slave 571, and it is the only way to get such a mount as 571 and 602 are in one shared group now and all later mounts to them will propagate between them and create dublicated mounts. So to create real 603 without dups we need to have /test/slave mounted before /test/share/slave, which contradicts with current assumption. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Pavel Tikhomirov authored
These test is not automatic as after kernel v4.11 behaviour changes, on older kernel we get children collision: 817 188 0:48 / /zdtm/static/unsupported_children_collision.test/share1 rw,relatime shared:942 - tmpfs share rw > 818 817 0:124 / /zdtm/static/unsupported_children_collision.test/share1/child rw,relatime shared:943 - tmpfs child1 rw 819 188 0:48 / /zdtm/static/unsupported_children_collision.test/share2 rw,relatime shared:942 - tmpfs share rw 820 819 0:125 / /zdtm/static/unsupported_children_collision.test/share2/child rw,relatime shared:944 - tmpfs child2 rw > 821 817 0:125 / /zdtm/static/unsupported_children_collision.test/share1/child rw,relatime shared:944 - tmpfs child2 rw Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Pavel Tikhomirov authored
See more detailed explanation inside in-code comment. note: Actually before we remove validate_mounts (later in these patchset) we likely won't get to these check and fail earlier, as having children collision implies shared mounts with different sets of children. note: from v4.11 and ms kernel commit 1064f874abc0 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.") there will be no more mount collision. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Adrian Reber authored
Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
We use fdstore intensively for example when handling bindmounted sockets and ghost dgram sockets. The system limit for per-socket queue may not be enough if someone generate lots of ghost sockets (150 and more as been detected on default fedora 27). To make it operatable lets unlimit fdstore queue size on startup. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Adrian Reber authored
Signed-off-by:
Adrian Reber <areber@redhat.com> Acked-by:
Radostin Stoyanov <rstoyanov1@gmail.com> Acked-by:
Radostin Stoyanov <rstoyanov1@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
When we are dumping epoll and one of target fd is been duped we can reuse already collected fds rbtree to find proper target. We handle it in a lazy way: - try use plain regular bsearch first, in case of all targets are not duped we checkpoint epoll immediately - if bsearch failed we put this epoll entry into a queue and run its dumping later when all other files in the process are already dumped. At this moment fds tree should already has all target files in rbtree thus we can simply lookup for it Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
It is used in files tree generation so we will need reuse for epoll sake. Also use the whole 64 bit offset to shuffle bits more. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
To find target files with help of our collected rbtree. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
If we can't find target file descriptor we should exit on dump with error instead of skipping it. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
We will use them to fast lookup of targets files. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
To run epoll tests only where it is supported. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
For readability sake Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
To figure out efd:tfd mapping easier by reading the logs. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
For easier fd match when reading logs Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
When target file obtained from epoll fdinfo (internally the kernel keeps only file _number_ inside) we have to check its identity to make sure it is exactly one which has been added into epoll engine. The only proper way is to use kcmp syscall. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
When we are checkpoiting epoll targets we assuming that this target file is belonging to the process we are on. This is of course not true. Without kernel support the only thing we can do is compare fd numbers with ones present in epoll fdinfo. When fd numer match we assume that it indeed the file which has been added into epoll. This won't cover the case when file has been moved to some other number and new one is reopened instead of it. Such scenario will trigger false positive and we can't do anything about. In next patches with kernel help we will make precise check for files identity. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
In epoll dumping we will need the whole set of fds to investigate the targets, so pass this parameter down to epoll code. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
We will need it to make sure the target files in epolls are present in current process. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
- aling memebers - use pid_t type for PIDs Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
- switch to use uintX type (just to drop uX finally, it doesn't worth to carry this type) - instead of including huge util.h rather include the files which are really needed: log, xmalloc, compiler and bug Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Before this patch we used flock to order task creation, but this way is not good. It took 5 syscalls to synchronize a creation of a single child: 1)open() 2)flock(LOCK_EX) 3)flock(LOCK_UN) 4)close() in parent 5)close() in child The patch introduces more effective way for synchronization, which executes 2 syscalls only. We use last_pid_mutex, and the syscalls number sounds definitely better. v2: Don't use flock() at all Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Introduce mutex for synchronization ns_last_pid file on restore. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Dmitry Safonov authored
I think, we should warn a user when we can't C/R compatible applications. That's valid for different than x86 archs. Let's correct the message the way it'll suit non-x86. Reported-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
vma_area_list@entry=0x818) at criu/cr-dump.c:107 107 list_for_each_entry_safe(vma_area, p, &vma_area_list->h, list) Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
When a non-root user runs "criu restore" and criu has the suid bit, a process will run with non-zero uid and gid. Before the 4.13 kernel (4d28df6152aa "prctl: Allow local CAP_SYS_ADMIN changing exe_file"), PR_SET_MM_EXE_FILE fails if uid or gid isn't zero. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Adrian Reber authored
Install sudo, create test user with ID 1000, install bash, fix pidfile creation and pidfile chmod. v2: * use sleep to give the criu daemon some time to start up v3: * Andrei is of course right and sleep is not good solution. After adding --status-fd support to criu service, this is how we now detect that criu is ready. v4: * This was much more complicated than expected which is related to the different versions of the tools on the different travis test targets. There seems to be a bug in bash on Ubuntu https://lists.gnu.org/archive/html/bug-bash/2017-07/msg00039.html which prevents using 'read -n1' on Ubuntu. As a workaround the result from CRIU's status FD is now read via python. Another problem was discovered on alpine with the loop restore test. CRIU says to use setsid even if the process is already using setsid. As a workaround, still with setsid, this process is now using shell-job true for checkpoint and restore. Parts of v2 have been committed before. So the changes from this commit are partially already in another commit. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-