files: Make tasks set their own service_fd_base
Currently, we set rlim(RLIMIT_NOFILE) unlimited and service_fd_rlim_cur to place service fds. This leads to a signify problem: every task uses the biggest possible files_struct in kernel, and it consumes excess memory after restore in comparation to dump. In some situations this may end in restore fail as there is no enough memory in memory cgroup of on node. The patch fixes the problem by introducing task-measured service_fd_base. It's calculated in dependence of max used file fd and is placed near the right border of kernel-allocated memory hunk for task's fds (see alloc_fdtable() for details). This reduces kernel-allocated files_struct to 512 fds for the most process in standard linux system (I've analysed the processes in my work system). Also, since the "standard processes" will have the same service_fd_base, clone_service_fd() won't have to actualy dup() their service fds for them like we have at the moment. This is the one of reasons why we still keep service fds as a range of fds, and do not try to use unused holes in task fds. Signed-off-by:Kirill Tkhai <ktkhai@virtuozzo.com> v2: Add a handle for very big fd numbers near service_fd_rlim_cur. v3: Fix excess accounting for nr equal to pow 2 minus 1. Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
Showing
Please
register
or
sign in
to comment