Commits · e7e9ece13e5a86f5e8a0f8254e3f35f00d277a8c · zhul / criu

06 Sep, 2016 14 commits

files: don't create a transport socket for each file ++ · e7e9ece1

Pavel Emelyanov authored Aug 10, 2016

The same thing as in e46ba886 exists in pipes, unix sockets and
ttys, so let's re-use the service transport fd there as well.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

e7e9ece1

cgroup: treat memory.oom_control specially too · 54253380

Tycho Andersen authored Aug 09, 2016

Similar to f444f7fac40, we need to treat memory.oom_control as a "special"
property and try not to write its default value, since in the 3.11 kernel
it can't be written when memory.use_heirarchy is true, which is the
default.

CC: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

54253380

zdtm: check a case when a root of sub-mntns is read-only · 379215e1

Andrei Vagin authored Aug 09, 2016

It's what we have when ReadOnlyDirectories=/ is set for systemd services.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

379215e1

mount: don't create a temporary directory if /tmp exists · 0661a8a4

Andrei Vagin authored Aug 09, 2016

pivot_root requires a place where to move an old root. Currently
a temporary directory is created for that, but it doesn't
work if the / directory is read-only.

Actually we can use any existing directory. In this patch,
criu tries to use /tmp and only if it doesn't exist,
criu creates a temporary directory.

https://bugs.openvz.org/browse/OVZ-6778

v2: don't give a constant string to mkdtemp
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

0661a8a4

travis: turn on alpine build · 043fd0ee

Dmitry Safonov authored Aug 08, 2016

Cc: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

043fd0ee

mem: generalize page_in_parent function and make it extern · 5d128a49

Fyodor Bocharov authored Aug 07, 2016

If we want to dedup anon shared memory we need to call page_in_parent.
So we need to make it extern.
Also in case of anon shared mem we have only 1 bit per page so we have to
change page_in_parent signature.
Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

5d128a49

page-read/xfer: add ability to open image hierarchy for shmem · ea8b5dab

Fyodor Bocharov authored Aug 07, 2016

In order to restore deduplicated anonymous shared memory we need
to open it's parent pagemap images. Code that opens parent pagemap
images already exists for anonymous private memory. All we need to do
is to remove couple of checks from existing code. Also we need to rename
pid to id because now we can pass either pid or shmid and the actual
meaning depends on pr_flags.
Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

ea8b5dab

proc_parse: collect longest shared vma size · 740305be

Fyodor Bocharov authored Aug 07, 2016

To dedup anon shared memory we need to know its longest size so we could
create page cache of appropriate size when dumping pages.
Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

740305be

ipc/sysctl: c/r kernel.{msg_next_id,sem_next_id,shm_next_id} · f960e432

Pavel Tikhomirov authored Jul 19, 2016

These are only three left in ipc_kern_table which we haven't
checkpointed yet, I'm not sure if somebody really uses them
except criu, but to be consistent, beter not to change them
while c/r.

v3: do one sysctl_op for all xxx_next_id(as sysctl_op is quiet slow)
v4: do only one sysctl_op in ipc_sysctl_req
v5: do msg*_default only if have /proc/sys/fs/mqueue same as other
ones from fs/mqueue
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

f960e432

test: add printf-attribute to test_msg · df81b884

Dmitry Safonov authored Jul 12, 2016

...and fix misprints that weren't caught before.

I guess, I never fixed that much (possible) bugs by one commit, heh.
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

df81b884

sigframe: prepare macro helpers for two sigframes · c220f6da

Dmitry Safonov authored Jun 28, 2016

As on x86 compat/native sigframe differ, I need to generalize/modify
sigframe macro helpers having in mind:
- SIGFRAME_OFFSET differ between native/compat tasks, so it takes
  sigframe parameter now, which will be used in following patches
  (also renamed it in RT_SIGFRAME_OFFSET to complement other macros)
- RT_SIGFRAME_FPU is now pointer, because each caller takes result's
  address with &RT_SIGFRAME_FPU(...)
- sigreturn_prep_fpu_frame now takes rt_sigframe parameter, as
  address of fpu_state pointer on x86 will depend on native/compat
  frame type, so I check local sigframe's type and count address
  for rsigframe. (See in the very next commit).

Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Christopher Covington <cov@codeaurora.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

c220f6da

arm/restorer: fix {,rt_}sigframe typo · 3a0c6fdd

Dmitry Safonov authored Jun 28, 2016

sigreturn_prep_fpu_frame is no-op for arm, but I think it's better
to fix it while macro correctly expands in sigframe.c - it
may change in some future.

Cc: Christopher Covington <cov@codeaurora.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

3a0c6fdd

sigframe: introduce SIGFRAME_MAX_OFFSET · e625ca72

Dmitry Safonov authored Jun 28, 2016

For x86 there are different SIGFRAME_OFFSET for native and compatible
tasks. In the next patches I will make SIGFRAME_OFFSET(rt_sigframe)
macro (depending on rt_sigframe).

As RESTORE_STACK_SIGFRAME used only for allocation sizes,
I don't want to introduce RESTORE_STACK_SIGFRAME(rt_sigframe)
dependency on rt_sigframe type. Let's just use for this porpose max
sigframe offset of native/compat tasks.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

e625ca72

restorer: implement restore of ps tree in stopped state · def23744

Kravchenko Dmitrii authored May 02, 2016

This patch adds implementation of --leave-stopped option
to CRIU restore cmd. If --leave-stopped is passed then
each process in ps tree gets SIGSTOP before detaching
from it.
Signed-off-by: Kravchenko Dmitrii <equivalence1@gmail.com>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

def23744

15 Aug, 2016 1 commit

criu: Version 2.5 · c0314172

Pavel Emelyanov authored Aug 15, 2016

Bug-fix mostly release.

We've also came very close in -dev branch to having x86 32bit
support, so hopefully we'll have it in 2.6/2.7. Lazy restore
now in test-able state, but still we want kernel patches to
leave maintainer's tree, so we still wait.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

c0314172

11 Aug, 2016 20 commits

files-reg: find appropriate dir to create link remap · 5d4cf626

Egor Gorbunov authored Aug 09, 2016

Currently during criu dump we create link remap in the same dir
where original file was opened. But that dir may not exist during
link remap creation. At the same time it's okay to create link
remap in any dir on the same mount point.
In this patch we do this. We check existance of every dir bottom
up through the original file path. We use the first existing dir.

Similar approach is used on criu restore during ghost file creation.
Signed-off-by: Egor Gorbunov <egor-mailbox@ya.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

5d4cf626

zdtm: check permissions for map_files · fad23922

Andrew Vagin authored Aug 08, 2016

Cc: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Test-for: b67d37d96fa0 ("proc_parse: fix vma file open mode recognition")
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

fad23922

make: propagate config DEFINES in CFLAGS · 90284132

Dmitry Safonov authored Aug 08, 2016

The problem:
$(DEFINES) array added to $(CFLAGS) in a global Makefile.
But, in criu/Makefile we include Makefile.config, which
adds feature-based config options to $(DEFINES).
We need to propagate this new defines again to CFLAGS array.

Previously, I added:
ccflags-y		+= $(DEFINES)
to Makefile.crtools, but those $(DEFINES) are useful not only
in the crtools makefile.

Let's just propagate this feature defines to CFLAGS and DEFINES
in place.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

90284132

make: drop check-build-packages - we can use try-cc now · 1d72d801

Dmitry Safonov authored Aug 08, 2016

Adjust all calls to try-cc to compare with 'true' instead of 'y',
drop additional rule check-build-packages, as we can check them
in the rule now.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

1d72d801

build/nmk: simplify try-cc and return true/false · 34996312

Dmitry Safonov authored Aug 08, 2016

- simplify: don't use temporary file -- use /dev/null instead
- return 'true' or 'false' -- this way we can use it inside rules
without external call to bash to compare return with 'y', see
the next patch for a use case.

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

34996312

vma: rename longest to priv_longest · ce5a01f7

Fyodor Bocharov authored Aug 07, 2016

'longest' field in vma_area_list struct stores longest private vma
size. It is better to name it priv_longest as it is done
for priv_size field.
Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

ce5a01f7

test/zdtm_ct: fix comments spacing · e71b877e

Mike Rapoport authored Aug 07, 2016

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

e71b877e

faul-inject: Check a case when parasite can't initialize a command socket · fafdfb47

Andrew Vagin authored Aug 05, 2016

Give a fake address for a server socket.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

fafdfb47

systemd-autofs-restart.sh: do not treat absence on bindmount as error · 6e89dbbf

Stanislav Kinsburskiy authored Aug 05, 2016

There can be autofs direct mount point without target mount on top.
In this case there won't be any bindmount and nothing to restore on top of the
autofs mount point.
Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

6e89dbbf

tcp: don't leak a file descriptor · 034bfe16

Andrew Vagin authored Aug 04, 2016

CID 164719 (#1 of 1): Resource leak (RESOURCE_LEAK)
7. leaked_handle: Handle variable sk going out of scope leaks the handle.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

034bfe16

files-reg: discover file system type only if file is going to be dumped · 290e238d

Stanislav Kinsburskiy authored Aug 02, 2016

Instead of calling it in fill_fd_params_special (which is called for any found
path).
This reduces amount of system calls on dump.
Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

290e238d

contrib/docker_cr.sh: fix a typo in a comment · f35f8a6e

Kir Kolyshkin authored Aug 04, 2016

comand -> command
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

f35f8a6e

images/tty.proto: fix typo in a comment · 0eac990d

Kir Kolyshkin authored Aug 04, 2016

presense -> presence
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

0eac990d

criu/*c: fix typos · dee52730

Kir Kolyshkin authored Aug 04, 2016

In comments:
 cant -> can't
 irrelevent -> irrelevant
 sucess -> success
 prepartion -> preparation
 recepient -> recipient
 lenght -> length
 hexidecimal -> hexadecimal
 becuase -> because
 responce -> response
 controll -> control
 existance -> existence
 alltogether -> altogether
 comparision -> comparison
 immediatly -> immediately
 happenned -> happened
 allready -> already
 simplier -> simpler
 succesfully -> successfully
 absense -> absence

In debug messages:
 Transfering -> Transferring

In error messages:
 reponse -> response
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

dee52730

criu/include: fix typos in comments · ffed8bb7

Kir Kolyshkin authored Aug 04, 2016

Initilize -> Initialize
immediatelly -> immediately
carefull -> carefully
transfering -> transferring
descriptrs -> descriptors
transfered -> transferred
comparision -> comparison
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

ffed8bb7

ppc64: fix typos · 42cc0ff6

Kir Kolyshkin authored Aug 04, 2016

comming	-> coming
puting	-> putting
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

42cc0ff6

Syscall tables: fix typos · 27a59a65

Kir Kolyshkin authored Aug 04, 2016

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

27a59a65

criu/Makefile: fix a typo · 30a5b1f4

Kir Kolyshkin authored Aug 04, 2016

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

30a5b1f4

COPYING: fix a typo in a preamble · d7325fc6

Kir Kolyshkin authored Aug 04, 2016

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

d7325fc6

prepare_pstree: fixup reading kernel pid_max · cb58aa84

Kir Kolyshkin authored Aug 04, 2016

Two fixes (reported by coverity) and a minor nitpick:

1. Fix checking error from open_proc().

2. Fix buffer overflow. MAX_ULONG can be 20 characters long, so
ret = read() can return 20 and buf[ret] = 0 will overrun the buf.
Make a buf one character longer (an extra byte for \0) and pass
sizeof(buf) - 1 to read to fix it.

3. Call close() right after read().

This is a fixup to commit e68bded.

Reported by Coverity, CID 168505, 168504.

Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

cb58aa84

08 Aug, 2016 5 commits

seize: Wait the freezer to complete before processing tags · c44683c1

Cyrill Gorcunov authored Aug 02, 2016

Currently, when we use cgroup freezer to seize the tasks we start freezer
and then without waiting the completion of transition procedure we are
seizing tasks read from freezer @tasks file, using fgets.

This is fragile construction because fgets uses internal buffer and tasks
we've read might be exiting same time while we're freezing them,
the kernel won't freeze these exiting tasks because they are dying
anyway and I fear we might read a pid here which is not even in
our cgroup anymore but reused with another out of cgroup task.

Thus lets do the following: use iterations to freeze tasks waiting
for freezer to change its state and then collect/seize all tasks
in one pass.

For example on container I'm playing with it takes just one iteration

 | (00.013690) cg: Set 1 is criu one
 | (00.013705) freezing processes: 1800000 attempst with 100 ms steps
 | (00.013720) freezer.state=THAWED
 | (00.013795) freezer.state=FREEZING
 | (00.113962) freezer.state=FROZEN
 | (00.113990) freezing processes: 1 attempts done
 | (00.114073) SEIZE 240893 (comm systemd): success
 | (00.114110) Warn  (ptrace.c:121): Unable to interrupt task: 240905 (comm kthreadd/1) (Operation not permitted)
 | (00.114136) Warn  (ptrace.c:121): Unable to interrupt task: 240906 (comm khelper) (Operation not permitted)
 | (00.114155) SEIZE 240969 (comm screen): success
 | (00.114166) SEIZE 240970 (comm sendmail): success
 | (00.114179) SEIZE 240971 (comm sendmail): success
 | (00.114189) SEIZE 240972 (comm saslauthd): success
 | (00.114202) SEIZE 240973 (comm crond): success
 | (00.114211) SEIZE 240974 (comm agetty): success
 | (00.114221) SEIZE 240975 (comm agetty): success
 | ...
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Acked-by: Andrew Vagin <avagin@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

c44683c1

seize: Take --timeout option into account when freezing processes · 9fae23fb

Cyrill Gorcunov authored Aug 02, 2016

When we're freezing processes we don't count on anything but
rather do 5 attempts with constantly increasing delay.

Lets rather do the following:

 - take --timeout option into account (which is 5 seconds
   by default) and split it into 100 ms steps;

 - when frezing processes check freezer status every 100 ms.

Same time it looks that 5 seconds by default is too small
for high loaded containers. Lets increase it to 10 seconds
by default.

[ skinsbursky@:
Frankly speaking, in this particular case increasing intervals are not nice.
This is not a network issue or something.
Usually freezing takes less than a second, but more, that, say 200ms.
Otherwise it takes quite o lot of time.
If step size is growing all the time, in most of the cases criu will
waste hundreds of milliseconds between iterX (say, third) and (iterX+1)
because of the growing step size.
100ms step looks solid enough: not to small to produce a lot of syscalls
and not to large to waste a lot of time.
With previous scheme freezing was usually taking half a second more that
it should because of this growing step.

[ gorcunov@:
You won't belive, but been able to sepcify --timeout 0 here allowed
me and Stas to catch serieous problem in freezer code.

https://lkml.org/lkml/2016/8/3/317

Without this feature we would have to patch criu instead. So you know,
this would be great to keep it for catching more kernel bugs ;)
Reported-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

9fae23fb

log: Print version on startup · 5a43e55e

Cyrill Gorcunov authored Aug 02, 2016

For debug sake.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

5a43e55e

files: don't create a transport socket for each file · e46ba886

Andrew Vagin authored Jul 29, 2016

This is an unix dgram socket which doesn't have an address and
isn't connected to somewhere, so we can use one socket for all processes.

v2: return non-zero code in error cases
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

e46ba886

zdtm.py: check for link remap files presence on test end · 3e840917

Stanislav Kinsburskiy authored Aug 02, 2016

These files have to be removed after successful restore.

v2:
Check link remap files only for tests with "--link-remap" option in
descriptor.
Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

3e840917