• Kirill Tkhai's avatar
    files: Make tasks set their own service_fd_base · 64ae0364
    Kirill Tkhai authored
    Currently, we set rlim(RLIMIT_NOFILE) unlimited
    and service_fd_rlim_cur to place service fds.
    This leads to a signify problem: every task uses
    the biggest possible files_struct in kernel, and
    it consumes excess memory after restore
    in comparation to dump. In some situations this
    may end in restore fail as there is no enough
    memory in memory cgroup of on node.
    
    The patch fixes the problem by introducing
    task-measured service_fd_base. It's calculated
    in dependence of max used file fd and is placed
    near the right border of kernel-allocated memory
    hunk for task's fds (see alloc_fdtable() for
    details). This reduces kernel-allocated files_struct
    to 512 fds for the most process in standard linux
    system (I've analysed the processes in my work system).
    
    Also, since the "standard processes" will have the same
    service_fd_base, clone_service_fd() won't have to
    actualy dup() their service fds for them like we
    have at the moment. This is the one of reasons why
    we still keep service fds as a range of fds,
    and do not try to use unused holes in task fds.
    Signed-off-by: 's avatarKirill Tkhai <ktkhai@virtuozzo.com>
    
    v2: Add a handle for very big fd numbers near service_fd_rlim_cur.
    v3: Fix excess accounting for nr equal to pow 2 minus 1.
    Signed-off-by: 's avatarAndrei Vagin <avagin@virtuozzo.com>
    64ae0364
util.c 27.9 KB