Commits · 3b71b95a190cfea5ecef73b059fe942bef485e7d · zhul / criu

02 Mar, 2018 40 commits

x86/crtools: Add fork() err-path handle · 3b71b95a

Dmitry Safonov authored Feb 07, 2018

Error-path for failed fork().
Looks originally forgotten, oops!
Also print a message on failed fork().
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

3b71b95a

inotify: Fix open_*notify_fd() never fails · f89aa7f3

Kirill Tkhai authored Feb 08, 2018

We ignore restore_one_*notify() error code, while we mustn't.
Make open function fail when we can't restore them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

f89aa7f3

inotify: Do not DDOS by debug message on restore watch descriptor · 23dad982

Kirill Tkhai authored Feb 08, 2018

Imagine, we have to restore inotify with watch descriptor 0x34d71d6.
Then we have:

1.235021     5578: fsnotify:           Watch got       0x1 but 0x34d71d6 expected
...
...
527.378042   5578: fsnotify:           Watch got 0x34d71d3 but 0x34d71d6 expected
527.378042   5578: fsnotify:           Watch got 0x34d71d4 but 0x34d71d6 expected
527.378042   5578: fsnotify:           Watch got 0x34d71d5 but 0x34d71d6 expected

Stop doing this and stop generating GBs of debug messages.
We already have print message before restore_one_inotify().
Let's add just one more after it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

23dad982

Fix typos · 3b338435

Radostin Stoyanov authored Feb 07, 2018

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

3b338435

zdtm: Add scm06 test · c670abb8

Kirill Tkhai authored Feb 01, 2018

This test makes looped unix sockets queues and tries
to iterate over them after the restore.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

c670abb8

files: Allow to send unix sockets over unix sockets · 75be843c

Kirill Tkhai authored Jan 30, 2018

Everything is ready. Message queue restores are in
the second stage of open for all types of unix sockets.
We just need to make scm wait before restore_unix_queue()
and allow to dump such scm context.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

75be843c

unix: Move dump_sk_queue() before peer resolution · fdb7bde5

Kirill Tkhai authored Jan 30, 2018

When we allow unix sockets sent over unix sockets,
dump_sk_queue() may dump and resolve some peers.
So, we need run it firstly and avoid linking our
peer_node to peer's peer_list.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

fdb7bde5

unix: Add fake queuer for standalone dgram sockets · 254958ce

Kirill Tkhai authored Jan 30, 2018

Similar to previous patch, this makes the second end
of dgram socketpair to be open till post open. This
allows to delay restore of message queue.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

254958ce

unix: Add fake queuer for standalone stream sockets in established state · d8ae190e

Kirill Tkhai authored Jan 30, 2018

This makes the second end of socketpair to live till post_open.
We need it alive if we want to restore message queue later.
Otherwise, we do not have a queuer, which fd is used to actually
write messages.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

d8ae190e

unix: Split collect_one_unixsk() · aae41a63

Kirill Tkhai authored Jan 30, 2018

Extract the functionality, which makes socket memory initialization.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

aae41a63

files: Implement find_unused_file_desc_id() · 7a5919cc

Kirill Tkhai authored Jan 30, 2018

This function will be used to allocate id for fake files
(don't confuse with fake fds, e.g. fles).
Suggested-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

7a5919cc

unix: Postpone restore_sk_common() of standalone sockets · 22371247

Kirill Tkhai authored Jan 30, 2018

restore_sk_common() may shutdown a socket, and queuer
won't be able to connect to it. So, this action must
be postponed.

We have this problem since long ago, but we are lucky
we haven't bumped in it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

22371247

unix: Make unix_sk_info::queuer pointer · b0493135

Kirill Tkhai authored Jan 30, 2018

Use pointer to the queuer instead of its id.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

b0493135

unix: Move queue restore of interconnected pair to post open · f44b5422

Kirill Tkhai authored Jan 30, 2018

Actually, there is no functional changes. We just postpone
restore of the queues. This will be used in the further
patches to restore unix sockets sent over unix sockets.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

f44b5422

unix: Rework peer transfer in open_unixsk_pair_master() · 58ec2ddf

Kirill Tkhai authored Jan 30, 2018

After previous patch, master and slave ends of socketpair
are owned by the only task. So, we may avoid using
of send_desc_to_peer() of the second end, and just
reopen it with right pid.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

58ec2ddf

files: Export setup_and_serve_out() · 6648656d

Kirill Tkhai authored Jan 30, 2018

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

6648656d

unix: Add fake interconnected sockets · 0f6f2246

Kirill Tkhai authored Jan 30, 2018

We're going to split interconnected pair restore
on two stages. Since we need the second end
to restore message queue in (future) post open,
we add it to the process, who is owner of the first
end.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

0f6f2246

sk-unix: Remove unused code in interconnected_pair() · cc0c57d9

Kirill Tkhai authored Jan 30, 2018

Since new file engine was introduced, we don't care
which particular pid should be master or slave.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

cc0c57d9

unix: Generalize get_fle_for_scm() · 60f47e91

Kirill Tkhai authored Jan 30, 2018

This adds a new argument and changes debug print
(it will be used for any fle, not only for scm).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

60f47e91

files: Add force_master argument to collect_desc_fle() · de8318c3

Kirill Tkhai authored Jan 30, 2018

This functional allows to make a fle a master on
the time of collection. We will use it to add fake
files when we need to do this after add_fake_fds_masters().

This will be used to add second end of socketpair as
a fake fle (as the first end is placed in the right
place, we will force add the second end there).
See next patches.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

de8318c3

unix: Move post_open_unix_sk() to open_unixsk_standalone() and rename it · de27a7c6

Kirill Tkhai authored Jan 30, 2018

Since this function is used by standalone sockets only,
we move it to appropriate place. No functional changes.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

de27a7c6

files: Allow epolls sent over unix socket · 8377abeb

Kirill Tkhai authored Jan 30, 2018

Since epoll restore is split in two parts,
epoll_create() does not depend on another
files state. Since epoll is created, it
can be sent to everywhere. So, there is
no circular dependences, and we allow epolls
sent over unix socket.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

8377abeb

zdtm: Add scm05 test · 7d1eeee7

Kirill Tkhai authored Jan 30, 2018

Create socketpair and epoll. Add one end of the socketpair
to epoll and then twice send it over another end.

After restore check, that epoll can be received
via socket, and that it contains event.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

7d1eeee7

travis: don't fail a build when the s390 job failed · 4ca19114
Andrei Vagin authored Feb 02, 2018
```
Builds for s390x fail due to a qemu bug.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
```
4ca19114

zdtm: fix a lint warning · 7f936f3b

Andrei Vagin authored Feb 01, 2018

$ make lint
flake8 --config=scripts/flake8.cfg test/zdtm.py
test/zdtm.py:323:19: F841 local variable 'e' is assigned to but never used
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

7f936f3b

zdtm: Fix fd01 cleanup · 4fe32e5d

Kirill Tkhai authored Feb 01, 2018

waitpid() does not return child pid, when child has not exited.
So, we can't use it to find pids of children.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

4fe32e5d

mount: fix cwd_fd leak on clone error · f48bf982

Pavel Tikhomirov authored Jan 29, 2018

We should close cwd_fd on error paths, found by Coverity Scan:

*** CID 187162:  Resource leaks  (RESOURCE_LEAK)
/criu/mount.c: 1370 in open_mountpoint()
1364                     */
1365                    pid = clone_noasan(ns_open_mountpoint, CLONE_VFORK | CLONE_VM
1366                                    | CLONE_FILES | CLONE_IO | CLONE_SIGHAND
1367                                    | CLONE_SYSVSEM, &ca);
1368                    if (pid == -1) {
1369                            pr_perror("Can't clone helper process");
>>>     CID 187162:  Resource leaks  (RESOURCE_LEAK)
>>>     Handle variable "cwd_fd" going out of scope leaks the handle.
1370                            return -1;
1371                    }
1372
1373                    errno = 0;
1374                    if (waitpid(pid, &status, __WALL) != pid || !WIFEXITED(status)
1375                                    || WEXITSTATUS(status)) {
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

f48bf982

mount: fix uninitialized use of fd on switch_ns error · 455b96c0

Pavel Tikhomirov authored Jan 29, 2018

close_safe can operate uninitialized fd in case of error in switch_ns,
found by Coverity Scan:

*** CID 187164:  Uninitialized variables  (UNINIT)
/criu/mount.c: 1313 in open_mountpoint()
1307     err:
1308            return 1;
1309     }
1310
1311     int open_mountpoint(struct mount_info *pm)
1312     {
>>>     CID 187164:  Uninitialized variables  (UNINIT)
>>>     Declaring variable "fd" without initializer.
1313            int fd, cwd_fd, ns_old = -1;
1314
1315            /* No overmounts and children - the entire mount is visible */
1316            if (list_empty(&pm->children) && !mnt_is_overmounted(pm))
1317                    return __open_mountpoint(pm, -1);
1318
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

455b96c0

zdtm: suppress useless error messages · 901d0de9

Andrei Vagin authored Jan 28, 2018

Start test
./mxcsr --pidfile=mxcsr.pid --outfile=mxcsr.out
Run criu dump
Unable to kill 44: [Errno 3] No such process <--------------- this one
Run criu restore
Run criu dump
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>

901d0de9

criu: fix two issue with possible out-of-bound access · 9c93e0d3
Andrei Vagin authored Jan 26, 2018
```
Signed-off-by: Andrei Vagin <avagin@openvz.org>
```
9c93e0d3

bfd: avoid out-of-bound access · ecbec8be

Andrei Vagin authored Jan 26, 2018

Write a nullbyte only if there is enought space for it.

Cc: Stephen Röttger <stephen.roettger@gmail.com>
Reported-by: Stephen Röttger <stephen.roettger@gmail.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>

ecbec8be

files: change error to warning in linkat_hard error path · 0d9d2712

Pavel Tikhomirov authored Jan 17, 2018

We print errors in all error cases when calling linkat_hard anyway, but
for some errors like EEXIST we are fine and just skip them, so we should
not print error here.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

0d9d2712

zdtm: check ghost restores on readonly fs if it is also a ghost in other writable bind · 9ffe2f33

Pavel Tikhomirov authored Jan 17, 2018

It is a test for convert_path_from_another_mp fix, it is a bit tricky as
we don't fully support ghosts on readonly fs, but only if the ghost can
be remaped on some _other_ bindmount (luckily we have same ghost on other
bind). Moreover wrong absolute path generated with old convert_path_from
_another_mp for lnkat don't always fail, only in case we want to do
linkat on mount in _other_ mountns and absolute path makes us do it in
local mountns and local path is readonly and we fail. =)

v2: remove unused headers
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

9ffe2f33

files: make convert_path_from_another_mp always return relative path · f7934aa3

Pavel Tikhomirov authored Jan 17, 2018

If dmi->ns_mountpoint is "/" then in dst we will return "/..." -
absolute path but we want here path relative to dmi mount. Adding "./"
before the path guaranties that it will be always relative.

https://jira.sw.ru/browse/PSBM-72351Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

f7934aa3

zdtm: Add fd01 test · 248a04e0

Kirill Tkhai authored Jan 10, 2018

Fork tasks and create fds with different numbers.
Some children share file with parent (CLONE_FILES).
Check, than we can suspend and resume in this case.

v2: New
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

248a04e0

files: Make tasks set their own service_fd_base · 64ae0364

Kirill Tkhai authored Jan 10, 2018

Currently, we set rlim(RLIMIT_NOFILE) unlimited
and service_fd_rlim_cur to place service fds.
This leads to a signify problem: every task uses
the biggest possible files_struct in kernel, and
it consumes excess memory after restore
in comparation to dump. In some situations this
may end in restore fail as there is no enough
memory in memory cgroup of on node.

The patch fixes the problem by introducing
task-measured service_fd_base. It's calculated
in dependence of max used file fd and is placed
near the right border of kernel-allocated memory
hunk for task's fds (see alloc_fdtable() for
details). This reduces kernel-allocated files_struct
to 512 fds for the most process in standard linux
system (I've analysed the processes in my work system).

Also, since the "standard processes" will have the same
service_fd_base, clone_service_fd() won't have to
actualy dup() their service fds for them like we
have at the moment. This is the one of reasons why
we still keep service fds as a range of fds,
and do not try to use unused holes in task fds.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

v2: Add a handle for very big fd numbers near service_fd_rlim_cur.
v3: Fix excess accounting for nr equal to pow 2 minus 1.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

64ae0364

files: Prepare clone_service_fd() for overlaping ranges. · 928d116d

Kirill Tkhai authored Jan 10, 2018

In normal life this is impossible. But in case of big
fdt::nr number (many processes, sharing the same files),
and custom service_fd_base, normal (!CLONE_FILES) child
of such process may have overlaping service fds with
parent's fdt. This patch introduces "memmove()" behavior
(currently there is "memcpy()" behavior) and this will
be used in next patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

928d116d

files: Refactor clone_service_fd() · 6c4b0de0

Kirill Tkhai authored Jan 10, 2018

This patch just moves part of clone_service_fd()
to separate function, that change readability of the code.

There are no functional changes, only refactoring.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

6c4b0de0

files: Do setup_newborn_fds() later · 37b99ebe

Kirill Tkhai authored Jan 10, 2018

This patch makes the call of service fds relocation after
root_prepare_shared()->prepare_fd_pid(). Next patches
will make service_fd_base depend on task's max fd used,
and for root_item we need to read all fles to know
the maximum of them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

37b99ebe

files: Close old service fd in clone_service_fd() · d076328c

Kirill Tkhai authored Jan 10, 2018

Next patches will make service_fd_base not contant.
It will be "floating" and change from task to task.
This patch makes preparation for that: it closes
old service fd after it's duplicated.

Currently the code is unused as in case of
!(rsti(me)->clone_flags & CLONE_FILES), the child
has the same id as its parent, and the duplication
just does not occur.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

d076328c