- 19 Sep, 2014 7 commits
- 
- 
Tycho Andersen authoredSigned-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Tycho Andersen authoredIf a file like /proc/20/mountinfo is open, but 20 is a zombie (or doesn't exist any more), we can't read this file at all, so a link remap won't work. Instead, we add a new remap, called the dead process remap, which forks a TASK_HELPER as that dead pid so that the restore task can open the new /proc/20/mountinfo instead. This commit also adds a new stage CR_STATE_RESTORE_SHARED. Since new TASK_HELPERS are added when loading the shared resource images, we need to wait to start forking tasks until after these resources are loaded. v2: fix a mutex bug Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Tycho Andersen authoredIn order to use TASK_HELPERS to open files from dead processes, they should persist until criu is done restoring the filesystem, which happens in the RESTORE stage. To do this, we need to pass each helper's PIDs to the restorer blob, so that it can wait() on them when the restore stage is done. This commit is in preparation for the remap_dead_pid commits. v2: wait() on helpers after restore stage is over v3: add CR_STATE_RESTORE_FS stage v4: CR_STATE_RESTORE_FS waits for nr_tasks + nr_helpers, not nr_threads v5: ditch CR_STATE_RESTORE_FS in favor of passing helpers to restorer blob Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Andrey Vagin authoredCurrently here is a bug, because when we see criu's mount namespace, we go to the "out" mark and don't validate mounts. Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredThe same reasoning as for personality file -- switch to plan open + read + close. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredIt turned out, that fdopen (used in fopen_proc) always maps a 4k buffer for reads and this buffer gets unmap-ed later on fclose. Taking into account the amount of proc files we read (~20 per task plus one file per opened file descriptor) this mmap+munmap result in quite a lot of useless CPU time. E.g. for a container of 20 tasks we have 1000 calls taking ~8% of total dump time. So lets first stop doing this for simple cases -- one line proc files. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Cyrill Gorcunov authoredSo it won't depend on the order in declaration. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
 
- 
- 18 Sep, 2014 13 commits
- 
- 
Pavel Emelyanov authoredThey don't change these objects, so can share them with parent (will be created slightly faster :) ). The plan is to make them CLONE_VM, but it's not that easy. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredWhen clone-ing kids we can set their stack on current, as it will anyway be COW-ed later. One thing to note -- we do need to reserve some space on the stack for glibc's arguments and retcode allocation. 128 bytes should be enough for 16 pointers while clone has 5 arguments. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredSigned-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredSigned-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Tycho Andersen authoredMaintain backwards compatibility for old images, but don't set the REMAP_GHOST bit going forward, only use the remap_type field. v2: * preserve remap_id in GHOST_REMAP case * protobuf field is remap_type enum not u32 Signed-off-by:Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Ruslan Kuprieiev authoredIn cr_dump_tasks() we expect restore_root_task to return < 0 if error ocures. Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Cyrill Gorcunov authoredIf there is no separator in first place we should avoid implicit + 1 which make @name = 1 in worst case. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Matthias Neuer authoredMy debian testing produces the following output for uname: $ uname -r 3.14-2-amd64 and so: $ set -- `uname -r | sed 's/\./ /g'` $ echo $1 3 $ echo $2 14-2-amd64 this causes zdtm.sh to fail for me on line 293: [ $1 -eq 3 -a $2 -ge 11 ] && return 0 because "14-2-amd64 -ge 11" is false. Signed-off-by: Matthias Neuer <matthias.neuer@uni-ulm.de> Reviewed-by: Christopher Covington <cov@codeaurora.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Ruslan Kuprieiev authoredThis makes only root to be able to modify images by default. When using criu with suid bit set, group of the images is set to user group, which is not safe, considering current CR_FD_PERM. Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Cyrill Gorcunov authoredIf @ticks is zero the kernel returns error because on creation the @ticks is already zero, so simply setup @ticks if real value present. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredThe PTRACE_SYSCALL traps task twice -- first on enter into and then on exit from syscall. If we trace a single task (and we do it on dump two times per task) we may skip half of all getregs calls -- on exit we don't need them. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com> 
- 
Cyrill Gorcunov authoredSigned-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Andrey Vagin authoredWe have the same check a few lines above. v2: fix the subject Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
 
- 
- 16 Sep, 2014 3 commits
- 
- 
Andrey Vagin authored$ cat /proc/self/mountinfo ... 1 1 0:2 / / rw - rootfs rootfs rw,size=373396k,nr_inodes=93349 ... You can see that mnt_id and parent_mnt_id are equals here. This patch interpretes this case as a root mount in a tree. 0'th mount is rootfs, which is mounted in init_mount_tree(). We don't see it in cases when system makes chroot, because of static int show_mountinfo(struct seq_file *m, struct vfsmount *mnt) ... /* mountpoints outside of chroot jail will give SEQ_SKIP on this */ err = seq_path_root(m, &mnt_path, &root, " \t\n\\"); Cc: beproject criu <beprojectcriu@gmail.com> Cc: Christopher Covington <cov@codeaurora.org> Reported-by: beproject criu <beprojectcriu@gmail.com> Reviewed-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredPid is 10 chars maximum. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Cyrill Gorcunov authoredWhen we compare sets in cg_set_compare() we presume that controller names are properly sorted but because of use of strcmp(cc->path, path) it's not true. In particular in case if there are two same sets which differ in paths only (00.126812) cg: `- New css ID 2 (00.127051) cg: `- [memory] -> [/vz-1] (00.127079) cg: `- [name=systemd] -> [/vz-1] (00.127108) cg: `- [net_cls] -> [/vz-1] (00.239829) cg: `- New css ID 3 (00.240067) cg: `- [memory] -> [/vz-1] (00.240096) cg: `- [net_cls] -> [/vz-1] (00.240154) cg: `- [name=systemd] -> [/vz-1/system.slice/dbus.service] we currently refuse to dump such configuretion. Thus remove path comparision from the first place. CC: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
 
- 
- 15 Sep, 2014 1 commit
- 
- 
Andrey Vagin authoredmntinfo contains mounts from all namespaces, so we can validate it only once after collecting mounts. v2: add a fake comment about goto v3: add a real comment about goto Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
 
- 
- 12 Sep, 2014 2 commits
- 
- 
Tycho Andersen authoredI'm not quite sure what the difference is (I have gcc 4.8, but there are probably also header differences), but when I compile the service on 14.04 I get: CC cr-service.o cr-service.c: In function ‘start_page_server_req’: cr-service.c:536:8: error: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Werror=unused-result] write(start_pipe[1], &ret, sizeof(ret)); ^ cr-service.c:544:6: error: ignoring return value of ‘read’, declared with attribute warn_unused_result [-Werror=unused-result] read(start_pipe[0], &ret, sizeof(ret)); ^ Signed-off-by:Tycho Andersen <tycho.andersen@canonical.com> Tested-by: https://travis-ci.org/avagin/criu/builds/34990769Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Konstantin Neumoin authoredavoid err() for regular msg reporting Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
 
- 
- 10 Sep, 2014 5 commits
- 
- 
Andrey Vagin authoredtmpfs has the "size" option, which is not standard. Execute zdtm/live/static/mountpoints ./mountpoints --pidfile=mountpoints.pid --outfile=mountpoints.out Dump 2737 WARNING: mountpoints returned 1 and left running for debug needs Test: zdtm/live/static/mountpoints, Result: FAIL ==================================== ERROR ==================================== Test: zdtm/live/static/mountpoints, Namespace: Dump log : /root/git/criu/test/dump/static/mountpoints/2737/1/dump.log --------------------------------- grep Error --------------------------------- (00.146444) Error (mount.c:399): Two shared mounts 50, 67 have different sets of children (00.146460) Error (mount.c:402): 67:./zdtm_mpts/dev/share-1 doesn't have a proper point for 54:./zdtm_mpts/dev/share-3/test.mnt.share (00.146820) Error (cr-dump.c:1921): Dumping FAILED. ------------------------------------- END ------------------------------------- ================================= ERROR OVER ================================= Signed-off-by: Andrey Vagin <avagin@openvz.org> Tested-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Andrey Vagin authoredCurrently we stript options only one of brothers, but mount_equal() thinks that two brothers should have the same options. Execute zdtm/live/static/mountpoints ./mountpoints --pidfile=mountpoints.pid --outfile=mountpoints.out Dump 2737 WARNING: mountpoints returned 1 and left running for debug needs Test: zdtm/live/static/mountpoints, Result: FAIL ==================================== ERROR ==================================== Test: zdtm/live/static/mountpoints, Namespace: Dump log : /root/git/criu/test/dump/static/mountpoints/2737/1/dump.log --------------------------------- grep Error --------------------------------- (00.146444) Error (mount.c:399): Two shared mounts 50, 67 have different sets of children (00.146460) Error (mount.c:402): 67:./zdtm_mpts/dev/share-1 doesn't have a proper point for 54:./zdtm_mpts/dev/share-3/test.mnt.share (00.146820) Error (cr-dump.c:1921): Dumping FAILED. ------------------------------------- END ------------------------------------- ================================= ERROR OVER ================================= Reported-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Tested-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Andrey Vagin authored"continue" is called by mistake, so we skip a few checks for shared mounts without siblings. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredWe have a slight mess with how criu restores root task. Right now we have the following options. 1) CLI a) Usually task calling criu `- criu `- root restored task b) when --restore-detached AND root has pdeath_sig task calling criu `- criu `- root restored task 2) Library/SWRK task using lib/swrk `- criu `- root restored task 3) Standalone service a) Usually service `- service sub task `- root restored task b) when root has pdeath_sig criu service `- criu sub task `- root restored task It would be better is CRIU always restored the root task as sibling, but we have 3 constraints: First, the case 1.a is kept for zdtm to run tests in pid namespaces on 3.11, which in turn doesn't allow CLONE_PARENT | CLONE_NEWPID. Second, CLI w/o --restore-detach waits for the restored task to die and this behavior can be "expected" already. Third, in case of standalone service tasks shouldn't become service's children. And I have one "plan". The p.haul project while live migrating tasks on destination node starts a service, which uses library/swrk mode. In this case the restored processes become p.haul service's kids which is also not great. That said, here's the option called --restore-child that pairs the --restore-detach like this: * detached AND child: task `- criu restore (exits at the end) `- root task The root task will become task's child. This will be default to library/swrk. This is what LXC needs. * detach AND !child task `- criu restore (exits at the end) `- root task The root task will get re-parented to init. This will be compatible with 1.3. This will be default to standalone service and to my wish with the p.haul case. * !detach AND child task `- criu restore (waits for root task to die) `- root task This should be deprecated, so that criu restore doesn't mess with task <-> root task signalling. * !detach AND !child task `- criu restore (waits for root task to die) `- root task This is how plain criu restore works now. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> Acked-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Andrew Vagin <avagin@openvz.org> 
- 
Tycho Andersen authoredroot_as_sibling was used in criu_signals_setup(), but was only defined later (when forking the root task for the first time). This meant that the SA_NOCLDSTOP was never masked off, which meant SIGCHLD was never delivered after ptracing the root task. Thus, when the a child of the root task died (e.g. from cr_system), the root task sat in PTRACE_STOP, and the restore task never PTRACE_CONT'd, resulting in a deadlock. Instead, we only unmask SA_NOCLDSTOP right before we PTRACE_SEIZE, after the value is defined. v2: re-work the condition for CLONE_PARENT v3: move unmasking of SA_NOCLDSTOP to restore_root_task v4: keep all the comments in the original code Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
 
- 
- 09 Sep, 2014 2 commits
- 
- 
Pavel Emelyanov authoredSigned-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Saied Kazemi authoredSince the command line for checkpointing and restoring Docker containers is very long and there are some manual steps involved before restoring a container, it's much easier to use a shell script to automate the work. One would simply do: $ sudo docker_cr.sh -c $ sudo docker_cr.sh -r Signed-off-by: Saied Kazemi <saied@google.com> Acked-by: Filipe Brandenburger <filbranden@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
 
- 
- 05 Sep, 2014 7 commits
- 
- 
Andrey Vagin authoredzdtm.sh with zero iterations of dumping/restoring checks correctness of tests. $ bash test/zdtm.sh -i 0 zdtm/inotify00 Output file: /root/git/orig/criu/test/zdtm/live/static/inotify00.out ------------------------------------------------------------------------------ 19:16:29.601: 6905: unlink 02 : event 0x200 -> IN_DELETE 19:16:29.602: 6905: unlink 02 : event 0x200 -> IN_DELETE 19:16:29.602: 6905: unlink 02 : event 0x8 -> IN_CLOSE_WRITE 19:16:29.602: 6905: unlink 02 : event 0x8 -> IN_CLOSE_WRITE 19:16:29.602: 6905: unlink 02 : event 0x400 -> IN_DELETE_SELF 19:16:29.602: 6905: unlink 02 : event 0x8000 -> IN_IGNORED 19:16:29.602: 6905: unlink 02 : read 6 events 19:16:29.614: 6905: after : event 0x8 -> IN_CLOSE_WRITE 19:16:29.614: 6905: after : read 1 events 19:16:29.614: 6905: FAIL: inotify00.c:217: Unhandled events in emask 0x200 -> IN_DELETE (errno = 11 (Resource temporarily unavailable)) ------------------------------------- END ------------------------------------- ================================= ERROR OVER ================================= This patch removes logic about linked files, because it's useless. Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Andrew Vagin authoredBTRFS returns subvolume dev-id instead of superblock dev-id, in such case return device obtained from mountinfo (ie subvolume0). v2: fix up devices only for btrfs files. v3: use phys_stat_dev_match instead of phys_stat_resolve_dev v4: fix cosmetic whims Reported-by: Mr Jenkins Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Konstantin Neumoin authored11: POSIX ADVISORY WRITE 1 b6:a4111:136512 0 EOF Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Cyrill Gorcunov authoredIt turns out that we can't be too strict about queued events -- criu itself generates a number of them and there is no clear way yet how to resolve this situation. So defer "strict" mode for now but print a warning. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredWhen service starts page server all the preparations (log, wdir, img dir, etc.) happen in parent task, then we fork page server. This is OK for now, but when we will serve several requests per connection, all these resources would be leaked in parent. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredThe problem with several requests is that criu leaks resources after doing dump/restore. It's OK since process exits anyway, but for multy requests per connection it's better to audit this thing. For now -- allow to do requests after the page-server-start one only. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
- 
Pavel Emelyanov authoredThat's preparation to "several requests per connection" patch. Signed-off-by:Pavel Emelyanov <xemul@parallels.com> 
 
-