- 02 Mar, 2018 21 commits
-
-
Kirill Tkhai authored
Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
The both branches need this, so move it up. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
mntns_get_root_fd() may be called by a task from !root_user_ns, and it fails if so. Put root fd to fdstore to allow use it every task. v3: New Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
CR_PROC_FD_OFF is need for accessing to foreign tasks fds, and will be used in the future. TRANSPORT_FD_OFF is for uniformity. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
I need named socket to communicate with pid_ns helpers (see next patches) and receive answer from them (it's impossible to send answer to unnamed socket). As we already have transport socket, we'll reuse it for the above goal too. This patch makes transport sockets be created before creation of children tasks. Also, now they are created not only for alive tasks (so we need additional manipulations for TASK_HELPERS, e.g., to call prepare_fdt()). v5: Return CLONE_FILES clone() argument during task helpers creation. Also get rid of fdt_mutex as CLONE_FILES processes does not close old files after clone, and we don't have intertersections between them. Also, socket() system call can't return a fd in service fds range, which was the main reason to have this mutex. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Next patches will create transport sockets in task helpers. As helpers are forked using CLONE_FILES, they must resolve shared fds to create their own service fds. This patch allows that. I've digged in the code, and there is no a reason, we need pid_rst_prio() during choosing of fdt restorer. So, this case may be safely deleted, which guarantees, that in case of TASK_HELPER, the restorer of fdt will be parent, i.e., no one TASK_HELPER will be restorer of fdt. v5: New Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
This is refactoring, which will be used in next patches. BUG_ON() just to mention that parent must be set before call of this function. v5: New Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
We close it in sigreturn_restore() for unification with other service fds, so kill the second close() from here. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
This minimize chances to hit problem where files used for page transfer are trying to use same number reserved for service fd. Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Will need it to unlimit the files allocation for service fd reserving and later for parasite code run (which is implemented in vz7 instance and soon will be ported into vanilla). Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
It has only one user, so unexport it. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Add a fake fd type for autofs. This allows functions like find_file_desc() work as expected, without having two different file_desc with the same type and same id. Also, later, it will allow to delete autofs_create_fle() and to use generic helper. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
Unchanged test provided by Andrew. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
In case we have mounts: 1 /mnt/ 2 /mnt/a with parent 1 3 /mnt/a/b with parent 1 4 /mnt/a with parent 2 We determine 2 as needing remap with does_mnt_overmount() and remap it. Next we mount 4 on top of 2. Next in fixup_remap_mounts() we want to move 2 back to it's parent 1, but instead move 4 there. So in these case children-overmounts need to be remapped too. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
Remaps in mnt_remap_list should follow same descending order which was setup in mnt_resort_siblings(), so don't reorder them. For instance if we have sibling mounts with mountpoints: 1) /dir1/dir2/dir3 2) /dir1/dir2 3) /dir1 Here (2) is sibling-overmount for (1). Mount (3) is sibling-overmount for both (1) and (2). So when we move overmounts back in fixup_remap_mounts() we should first move (2) and only then (3). Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
We should add new entry _before_ first entry with less depth to sort in descending order. e.g: entries in list have depths [7,5,3], adding new entry m with depth 4 we would break list_for_each_entry loop on p with depth 3, before patch we would get [7,5,3,4] after list_add, which is wrong. Also we can relax "<=" check to "<" to avoid unnecessary reordering. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
dump of VZ7 ct fails, if we have overmounted tmpfs inside: [root@silo ~]# prlctl enter su-test-2 entered into CT CT-829e7b28 /# mkdir /mnt/overmntedtmp CT-829e7b28 /# mount -t tmpfs tmpfs /mnt/overmntedtmp/ CT-829e7b28 /# mount -t tmpfs tmpfs /mnt CT-829e7b28 /# logout [root@silo ~]# prlctl suspend su-test-2 Suspending the CT... Failed to suspend the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: Will skip in-flight TCP connections (01.657913) Error (criu/mount.c:1202): mnt: Can't open ./mnt/overmntedtmp: No such file or directory (01.662528) Error (criu/util.c:709): exited, status=1 (01.664329) Error (criu/util.c:709): exited, status=1 (01.664694) Error (criu/cr-dump.c:2005): Dumping FAILED. Failed to checkpoint the Container All dump files and logs were saved to /vz/private/829e7b28-f204-4bce-b09f-d203b99befd4/dump/Dump.fail Checkpointing failed ) Criu wants to dump the contents of /mnt/overmntedtmp/ mount but it is unavailable. So we copy the mount namespace in such a case and unmount overmounts to access what we want to dump. Actual usecase here is dumping CT with active mariadb and ssh connection. Together they happen to create such overmount. As by default systemd creates a separate mount namespace for mysql and also mounts tmpfs to /run/user in it, and when ssh(root) is connected - systemd also mounts tmpfs in container root mount namespace to /run/user/0 for user files. As /run is slave mount /run/user/0 also propagates to mysql's mount namespace and initially becomes overmounted by /run/user. https://jira.sw.ru/browse/PSBM-57362 remove __maybe_unused for mnt_is_overmounted and umount_overmounts changes in v2: 1) Use clone not fork, share resources with parent same as in call_in_child_process. 2) Do not enter userns (create helper) for non-overmounted mounts. Thus return back setns/resorens logic. 3) Helper opens fd for parent directly due to CLONE_FILES, remove futex. 4) Check helper exit status properly. 5) Add get_clean_fd helper. 6) Add better comments. changes in v3: 1) Pass fd from helper through args instead of ret code, fix ret code checking. 2) Add \n to pr_err in open_mountpoint changes in v5: Make comments even better. Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
also remove __maybe_unused for __umount_children_overmounts note: leave it __maybe_unused yet Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
note: leave it __maybe_unused yet Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
note: leave it __maybe_unused yet Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 15 Feb, 2018 19 commits
-
-
Andrei Vagin authored
It was droped during one of rebases. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Dmitry Safonov authored
We should continue even if kdat feature isn't supported: [criu]# ./criu/criu dump -t `pidof pypy` --shell-job Warn (criu/kerndat.c:804): Can't load /run/criu.kdat Warn (criu/libnetlink.c:55): ERROR -95 reported by netlink Error (criu/net.c:3042): Unable to create a veth pair: -95 Warn (criu/net.c:3064): NSID isn't reported for network links Cc: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Dmitry Safonov <0x7f454c46@gmail.com>
-
Andrei Vagin authored
The origin idea was to set --empty net for criu dump and criu restore, but before cde33dcb ("empty-ns: Don't C/R iptables too (v2)"), criu restore worked without --empty net and we didn't notice that docker doesn't set this option on restore. After a small brainstorm, we decided that it is better to remove this requirement. Docker has to set this option, but with this changes, the docker issue will be less urgent. https://github.com/checkpoint-restore/criu/issues/393Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Emelyanov authored
There's a if (bad_thing) { ret = -1; break; } code above this hunk, whose intention is to propagate -1 back to caller. This propagation is obviously broken. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
runc restore executes criu with --emptyns network and set a setup-namespaces script to restore a network namespace. https://github.com/xemul/criu/issues/314 Looks-good-to: Pavel Emelyanov <xemul@virtuozzo.com> Reviewed-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: 2189b9c71d3d ("net: allow to dump and restore more than one network namespace") Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Picked from patch "[PATCH RFC] namespaces: use CLONE_VFORK with CLONE_VM when it is possible" by Andrew Vagin. Currenly parent touches child's stack, as in moment of clone() call its stack pointer is above the child's (we allocate char stack[128] on parent's stack). This prevents to create CLONE_VM|CLONE_VFORK processes, because the child uses stack addresses occupied by parent. The patch changes clone_noasan() behaviour and allows to do that with the same memory consumption. We give a child memory, which is not used by parent clone(), so parent's and child's stacks have no tntersection. This allows to create CLONE_VM|CLONE_VFORK processes. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
(Was "user_ns: Block SIGCHLD during namespaces generation") We don't want asynchronous signal handler during creation of namespaces (for example, in create_user_ns_hierarhy()) as we do wait() synchronous. So we need to block the signal. Do this once globally. v2: Set initial ret = 0 v3: Block signal globally in root_item before its children are created. v4: Move block to prepare_namespace() Suggested-by:
Andrew Vagin <avagin@virtuozzo.com> Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
The action is run in a very lightweight process. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
In next patches usernsd will need to create transport socket in the same net_ns as other tasks do their TRANSPORT_FD_OFF sockets. Choose criu net_ns for that: this allows usernsd to do not wait for creation of other net_ns, i.e. to do not introduce new dependencies between tasks. In case of (root_ns_mask & CLONE_NEWUSER) != 0 root_item's user_ns does not allow to restore criu net_ns, so do prepare_net_namespaces() in sub-process to do not lose criu net. v3: Introduce __prepare_net_namespaces and execute it in cloned task. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Since net ns is assigned after prepare_fds() and, in common case, at the moment of open_ns_fd() call task points to a net ns, which differs to its target net ns, we can't get the ns from a task. So, get it from fdstore. Also, support userns ns fds. v2: Add comment Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
-
Andrei Vagin authored
We shave a test case for external veth devices. This test case checks veth devices which are living in two dumped network namespaces. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
A network device, which is connected to a bridge, is restored after the bridge. In this case we can set the master attribute and the device will be connected to the bridge automatically. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
We will need to enumirate links a few times Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
It's a preparation for enumirating links a few times. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
When we dump a veth device, the kernel reports where a peer device lives and we use this information to restore this veth pair. On restore we set a net ns id for a peer and it is created in the required netns. v2: add more comments Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
It will be used to restore links in different net namesapces. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
In each network namespace we can set an id for another network namespace to be able to address it in netlink messages. For example, we can say that a peer of a veth devices has to be created in a network namespace with a specified id. If we request information about a veth device, a kernel will report where a peer device lives. An user are able to set this ID-s, so we have to dump and restore them. v2: add more commetns v3: make a union of nsfd_id and ns_fd, they are not used together Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
It will be used to dump netns id-s too. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-