Commit af55c059 authored by Andrew Vagin's avatar Andrew Vagin Committed by Pavel Emelyanov

mount: fix a race between restoring namespaces and file mappings (v2)

Currently we wait when a namespace will be restored to get its root.
We need to open a namespace root to open a file to restore a memory mapping.

A process restores mappings and only then forks children. So we can have
a situation, when we need to open a file from a namespace, which will be
"restored" by one of our children.

The root task restores all mount namespaces and opens a file descriptor
for each of them. In this patch we open root for each mntns in the root
task.

If we neeed to get root of a namespace which isn't populated, we can get
it from the root task. After the CR_STATE_FORKING stage, the root task
closes all namespace descriptors ane we know that all namespaces are
populated at this moment.

v2: don't close root_fd for root ns, because it was not opened
Signed-off-by: 's avatarAndrew Vagin <avagin@virtuozzo.com>
Signed-off-by: 's avatarPavel Emelyanov <xemul@parallels.com>
parent a9be7621
...@@ -38,6 +38,7 @@ struct ns_id { ...@@ -38,6 +38,7 @@ struct ns_id {
struct mount_info *mntinfo_list; struct mount_info *mntinfo_list;
struct mount_info *mntinfo_tree; struct mount_info *mntinfo_tree;
int ns_fd; int ns_fd;
int root_fd;
} mnt; } mnt;
struct { struct {
......
...@@ -2972,6 +2972,8 @@ void fini_restore_mntns(void) ...@@ -2972,6 +2972,8 @@ void fini_restore_mntns(void)
if (nsid->nd != &mnt_ns_desc) if (nsid->nd != &mnt_ns_desc)
continue; continue;
close(nsid->mnt.ns_fd); close(nsid->mnt.ns_fd);
if (nsid->type != NS_ROOT)
close(nsid->mnt.root_fd);
} }
} }
...@@ -3179,6 +3181,8 @@ int prepare_mnt_ns(void) ...@@ -3179,6 +3181,8 @@ int prepare_mnt_ns(void)
nsid->mnt.ns_fd = open_proc(PROC_SELF, "ns/mnt"); nsid->mnt.ns_fd = open_proc(PROC_SELF, "ns/mnt");
if (nsid->mnt.ns_fd < 0) if (nsid->mnt.ns_fd < 0)
goto err; goto err;
/* we set ns_populated so we don't need to open root_fd */
futex_set(&nsid->ns_populated, 1);
continue; continue;
} }
...@@ -3199,6 +3203,11 @@ int prepare_mnt_ns(void) ...@@ -3199,6 +3203,11 @@ int prepare_mnt_ns(void)
if (nsid->mnt.ns_fd < 0) if (nsid->mnt.ns_fd < 0)
goto err; goto err;
/* root_fd is used to restore file mappings */
nsid->mnt.root_fd = open_proc(PROC_SELF, "root");
if (nsid->mnt.root_fd < 0)
goto err;
/* And return back to regain the access to the roots yard */ /* And return back to regain the access to the roots yard */
if (setns(rst, CLONE_NEWNS)) { if (setns(rst, CLONE_NEWNS)) {
pr_perror("Can't restore mntns back"); pr_perror("Can't restore mntns back");
...@@ -3289,15 +3298,33 @@ set_root: ...@@ -3289,15 +3298,33 @@ set_root:
int mntns_get_root_fd(struct ns_id *mntns) { int mntns_get_root_fd(struct ns_id *mntns) {
/* /*
* We need to find a task from the target namespace and open its root. * All namespaces are restored from the root task and during the
* For that we need to wait when one of tasks enters into required * CR_STATE_FORKING stage the root task has two file descriptors for
* namespaces. * each mntns. One is associated with a namespace and another one is a
* root of this mntns.
*
* When a non-root task is forked, it enters into a proper mount
* namespace, restores private mappings and forks children. Some of
* these mappings can be associated with files from other namespaces.
* *
* The root task is born in the root mount namespace. * After the CR_STATE_FORKING stage the root task has to close all
* mntns file descriptors to restore its descriptors and at this moment
* we know that all tasks live in their mount namespaces.
*
* If we find that a mount namespace isn't populated, we can get its
* root from the root task.
*/ */
if (mntns->type != NS_ROOT) if (!futex_get(&mntns->ns_populated)) {
futex_wait_while_eq(&mntns->ns_populated, 0); int fd;
fd = open_proc(root_item->pid.virt, "fd/%d", mntns->mnt.root_fd);
if (fd < 0)
return -1;
return mntns_set_root_fd(mntns->ns_pid, fd);
}
return __mntns_get_root_fd(mntns->ns_pid); return __mntns_get_root_fd(mntns->ns_pid);
} }
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment