- 10 Jul, 2018 2 commits
-
-
Adrian Reber authored
The travis build for s390x started to fail with: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease Unable to find expected entry 'main/binary-s390x/Packages' in Release file (Wrong sources.list entry or malformed file) This changes the repository definition just like it is done for ppc64le. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Send pre-dump notify to 36 Traceback (most recent call last): File "zdtm.py", line 2161, in <module> do_run_test(tinfo[0], tinfo[1], tinfo[2], tinfo[3]) File "zdtm.py", line 1549, in do_run_test cr(cr_api, t, opts) File "zdtm.py", line 1264, in cr test.pre_dump_notify() File "zdtm.py", line 490, in pre_dump_notify fdin.write(struct.pack("i", 0)) TypeError: write() argument 1 must be unicode, not str Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 09 Jul, 2018 38 commits
-
-
Adrian Reber authored
Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
The idea of the test is: 1) mmap separate page and put variable there, so that other usage does not dirty these region. Initialize the variable with VALUE_A. 2) fork a child with special pid == CHILD_NS_PID. Only if it is a first child overwrite the variable with VALUE_B. 3) wait for the end of the next predump or end of restore with test_wait_pre_dump_ack/test_wait_pre_dump pair and kill our child. Note: The memory region is "clean" in parent. 4) goto (2) unles end of cr is reported by test_waitpre So on first iteration child with pid CHILD_NS_PID was dumped with VALUE_B, on all other iterations and on final dump other child with the same pid exists but with VALUE_A. But on all iterations after the first one we have these memory region "clean". So criu before the fix would have restored the VALUE_B taking it from first child's image, but should restore VALUE_A. Note: Child in its turn waits termination and performs a check that variable value doesn't change after c/r. We should run the test with at least one predump to trigger the problem: [root@snorch criu]# ./test/zdtm.py run --pre 1 -k always -t zdtm/transition/pid_reuse Checking feature ns_pid Checking feature ns_get_userns Checking feature ns_get_parent === Run 1/1 ================ zdtm/transition/pid_reuse ===================== Run zdtm/transition/pid_reuse in ns ====================== DEP pid_reuse.d CC pid_reuse.o LINK pid_reuse Start test Test is SUID ./pid_reuse --pidfile=pid_reuse.pid --outfile=pid_reuse.out Run criu pre-dump Send the 10 signal to 52 Run criu dump Run criu restore Send the 15 signal to 73 Wait for zdtm/transition/pid_reuse(73) to die for 0.100000 Test output: ================================ 14:47:57.717: 11235: ERR: pid_reuse.c:76: Wrong value in a variable after restore 14:47:57.717: 4: FAIL: pid_reuse.c:110: Task 11235 exited with wrong code 1 (errno = 11 (Resource temporarily unavailable)) <<< ================================ https://jira.sw.ru/browse/PSBM-67502 v3: simplify waitpid's status check v9: switch to test_wait_pre_dump(_ack) Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
*** CID 190172: Uninitialized variables (UNINIT) /criu/mem.c: 325 in detect_pid_reuse() Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Commit 37e4c7bfc264 fixed arm, ppc, x86 (32bit), while it made wrong definition of x86_64. Fix that. Also, add commentary to raw fork() implementation. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pawel Stradomski authored
Signed-off-by:
Pawel Stradomski <pstradomski@google.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Radostin Stoyanov authored
The `port` option is converted from unsigned short integer to network byte order twice. Unfortunately the 2nd conversion reverses the 1st one. Example: #include <stdio.h> #include <arpa/inet.h> #include <stdlib.h> int main() { printf("%d\n", htons(atoi("1234"))); /* 53764 */ printf("%d\n", htons(htons(atoi("1234")))); /* 1234 */ return 0; } Signed-off-by:
Radostin Stoyanov <rstoyanov1@gmail.com>
-
Mike Rapoport authored
The criu_status_in is not always used and it may be -1 when the signal handler closes it. With lazy-pages we hit a corner case which clobbers the errno value. This happens when we resume the process inside glibc syscall wrapper and get the signal before the page containing errno is copied. In this case, signal handler is invoked before the syscall return value is written to errno and the actual value of errno seen by the process becomes -EBADF because of close(-1) in the signal handler. Let's ensure that close() in signal handler does not fail to make Jenkins happier while the proper solution for the lazy-pages issue is found. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Cc: Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Reported-by:
Dmitry Safonov <dima@arista.com> Cc: Dmitry Safonov <dima@arista.com> Signed-off-by:
Andrei Vagin <avagin@openvz.org> Reviewed-by:
Dmitry Safonov <dima@arista.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Mike Rapoport authored
The kerndat_init() is now called before the jump to action handler. This allows us to directly use kdat without calling to the corresponding kerndat_*() methods. ✓ travis-ci: success for lazy-pages: update checks for availability of userfaultfd (rev3) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Andrei Vagin authored
Call "ps axf" if waitpid() is running more than 10 seconds Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
1) fix sfle memory leak on get_fle_for_scm error 2) fix gfd open descriptor leak on get_fle_for_scm error 3-6) fix buf memory leak on read and pwrite errors Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Cyrill Gorcunov authored
On fedora rawhide seccomp_metadata for some reason is not defined (while in kernel it introduced together with PTRACE_SECCOMP_GET_METADATA). So lets do a trick for a while -- define own alias. Once system headers get settled down we might find more suitable solution. Because it's a part of kernel API we're on the safe side. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
After CR_STATE_RESTORE_SIGCHLD stage triggered we are not allowed to exit, just yield the BUG instead. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
The seccomp_metadata may be already defined in system ptrace.h header on recent kernels so include it. https://github.com/checkpoint-restore/criu/issues/486#event-1628406918Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
*** CID 190178: Null pointer dereferences (NULL_RETURNS) /criu/seccomp.c: 296 in collect_filters() Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Credential commitment affects dumpable and pdeath signals so we have to move their restore after the restore_creds, just like we have in __export_restore_task (ie for group leader). https://jira.sw.ru/browse/PSBM-84198Acked-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Looking up for pid in nesting pidns supposed to be done for non group leaders only, thus __export_restore_thread do this check on its own and we don't have to make a similar lookup especially on group leader where pids in args never were valid. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
- Fix typo in sizeof() operand - Eliminate redundant prctl calls if no PTRACE_SECCOMP_GET_METADATA detected Reported-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Andrew proposed the test which actually triggered the issue in current seccomp series, put it into a regular basis. Suggested-by:
Andrey Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
When considering if we to call PTRACE_O_SUSPEND_SECCOMP on the tid we should take into account if there at least one thread which has seccomp mode enabled, otherwise we might miss filter suspension and restore procedure might break due to own criu syscall get filtered out. Same time we should move seccomp restore for threads to take place after CR_STATE_RESTORE_SIGCHLD state so that main criu code will attach to threads and setup seccomp suspension flag before we start restoring the filters. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
To checkpoint per-thread seccomp filters we need a significant rework of a dumping code. The general idea is the following: - Each thread is tracked by its tid inside global seccomp rbtree thus we can easily add entries there or lookup on demand. - When we collect threads into pstree entries we fetch its seccomp mode from procfs parsing routine and allocate a new entry inside rbtree to remember the seccomp mode. Note at this moment we're not dumping real filters yet (because filter data image is a single one for all consumers) - Once all tids are collected and our tree is complete we call for seccomp_collect_dump_filters helper which walks every pstree entry and iterate over each tid inside thread group calling seccomp_dump_thread, which in turn uses ptrace engine to fetch filters and keep this data in memory. To optimize data usage we figure out if we can use TSYNC flag on restore calling try_use_tsync helper: for TSYNC flag kernel automatically propagate filter to all threads, thus we need to compare all filters inside thread group for identity since there is no other way to figure out if user passed TSYNC flag when been creating filters. - Finally dump_seccomp_filters is called which does real write of seccomp filter data into an image file. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
At now we pretend that all threads are sharing seccomp chains and at checkpoint moment we test seccomp modes to make sure if this assumption is valid refusing to dump otherwise. Still the kernel tacks seccomp filter chains per each thread and now we've faced applications (such as java) where per-thread chains are actively used. Thus we need to bring support of handling filters via per-thread basis. In this a bit intrusive patch the restore engine is lifted up to treat each thread separately. Here what is done: - Image core file is modified to keep seccomp filters inside thread_core_entry. For backward compatibility former seccomp_mode and seccomp_filter members in task_core_entry are renamed to have old_ prefix and on restore we test if we're dealing with old images. Since per-thread dump is not yet implemeneted the dumping procedure continue operating with old_ members. - In pie restorer code memory containing filters are addressed from inside thread_restore_args structure which now contains seccomp mode itself and chain attributes (number of filters and etc). Reading of per-thread data is done in seccomp_prepare_threads helper -- we take one pstree_item and walks over every thread inside to allocate pie memory and pin data there. Because of PIE specific, before jumping into pie code we have to relocate this memory into new place and for this seccomp_rst_reloc is served. In restorer itself we check if thread_restore_args provides us enabled seccomp mode (strict or filter passed) and call for restore_seccomp_filter if needed. - To unify names we start using seccomp_ prefix for all related stuff involved into this change (prepare_seccomp_filters renamed to seccomp_read_image because it only reads image and nothing more, image handler is renamed to seccomp_img_entry instead of too short 'se'. With this change we're now allowed to start collecting and dumping seccomp filters per each thread, which will be done in next patch. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Note that there is no real usage of this flag on restore, we simply save it in image and will make a real use later. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
This header is main place for all seccomp related structures so move seccomp_info here. This will allow to minimize changes area when need to update definitions and such. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
For grepability sake in logs. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
We will use it to figure out if filter log target is used. Metadata associated with seccomp filter is relatively new feature which allows userspace to get and set it back. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
If pre-dump-notify flag is set, zdtm sends a notify to the test after pre-dump was finished and waits for the test to send back a reply that test did all it's work and now is ready for a next pre-dump/dump. How it can be used: while (!test_wait_pre_dump()) { /* Do something after predump */ test_wait_pre_dump_ack(); } /* Do something after restore */ Internally we open two pipes for the test one for receiving notify (with two open ends) and one for replying to it (only write end open). Fds of pipes are dupped to predefined numbers and zdtm opens these fds through /proc/<test-pid>/fd/{100,101} and communicates with the test. v9: switch to two way interface to remove race then operation we try to run after predump may be yet unfinished at the time of next dump. Suggested-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
We have a problem when a pid is reused between consequent dumps we can't understand if pagemap and pages from images of parent dump are invalid to restore these pid already. That can lead even to wrong memory restored for these pid, see the test in last patch. So these is a try do separate processes with (likely) invalid previous memory dump from processes with 100% valid previous dump. For that we use the value of /proc/<pid>/stat's start_time and also the timestamp of each (pre)dump. If the start time is strictly less than the timestamp, that means that the pagemap for these pid from previous dump is valid - was done for exactly the same process. Creation time is in centiseconds by default so if predump is really fast (<1csec) we can have false negative decisions for some processes, but in case of long running processes we are fine. https://jira.sw.ru/browse/PSBM-67502 v2: remove __maybe_unused for get_parent_stats; fix get_parent_stats to have static typing; print warning only if unsure; check has_dump_uptime v3: read parent stats from image only once; reuse stat from previous parse_pid_stat call on dump v4: move code to function; use unsigned long long for ticks; put proc_pid_stat on mem_dump_ctl; print warning on all pid-reuse cases v5: free parent's stats entry properly, pass it in arguments to (pre_)dump_one_task v6: free parent's stats in error path too v7: zero init parent_se v8: improve error message v9: switch to inventory image from stats, if pid-reuse fails - fail current dump Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
will be used in the next patch https://jira.sw.ru/browse/PSBM-67502 note: actually we need only one value from inventory entry but I still prefer general helper as we still need to read and allocate memory for the whole structure v2: fix get_parent_stats to have static typing v3: simplify get_parent_stats to return a StatsEntry pointer instead of doing it through arguments v8: replace errors with warnings, we should whatch on them only if we have corresponding error in detect_pid_reuse else they are fine v9: change stats to inventory image Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
We want to use a simple fact: If we have an alive process in a pstree we want to dump, and a starttime of that process is less than pre-dump's timestamp (taken while all processes were freezed), then these exact process existed (100% sure) at the time of these pre-dump and the process' memory was dumped in images. So save inventory image on pre-dump and put there an uptime. https://jira.sw.ru/browse/PSBM-67502 v9: improve comment, put uptime to ivnentory image as 1) where is no stats in parent images directory if --work-dir option is set to something different then images directory, 2) stats-dump is not an image and it is a bad practice to put there data required for restoring. v10:s/u_int64_t/uint64_t/ Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
will be used in the next patch https://jira.sw.ru/browse/PSBM-67502 note: man for /proc/uptime says that uptime is in seconds and for now the format is "seconds.centiseconds", where ecentiseconds is 2 digits note: now uptime is in csec but I prefer saving it in usec, that allows us to be reuse these image field when/if we have more accurate value. v8: add length specifier to parse only centiseconds v9: put uptime to u_int64_t directly, define CSEC_PER_SEC v10: switch to uint64_t from u_int64_t, comment about usec in image Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pawel Stradomski authored
Signed-off-by:
Pawel Stradomski <pstradomski@google.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Paweł Stradomski authored
Signed-off-by:
Pawel Stradomski <pstradomski@google.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pawel Stradomski authored
This makes it possible to have the pageserver communication go over anonymous unix sockets, e.g. created by socketpair(). Such setup makes it easier to secure pageserver connection by wrapping it in an encrypted tunnel. It also helps prevent attacks where a malicious process connects to page server and injects its own stream of pages to either fool criu into restoring wrong pages or to DoS the pageserver by having it exhaust local storage by writing large .img files. Signed-off-by:
Pawel Stradomski <pstradomski@google.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Reported-by:
Andrey Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-