- 27 Oct, 2018 6 commits
-
-
Cyrill Gorcunov authored
Will need them to mask some of the features from command line options. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
We don't have yet support of compacted xsave frames so report error on cpu-check, checkpoint, restore actions. Basically it is done in cpu_init routine which is called in the sites we're interested in. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
For debug sake. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Tracking cpuid features is easier when sync'ed with kernel source code. Note though that while in kernel feature bits are not part of ABI, we're saving bits into an image so as result make sure they are posted in proper place together with keeping in mind the backward compatibility issue. Here we also start using v2 of cpuinfo image with more feature bits. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 17 Jul, 2018 1 commit
-
-
Andrei Vagin authored
A set of images from criu dump can be used as a previous point, when we are doing snapshots. In this case, each point contains a full set of images. https://github.com/checkpoint-restore/criu/issues/479 v2: return -1 if invertory_save_uptime failed Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com> Reviewed-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 10 Jul, 2018 3 commits
-
-
Andrei Vagin authored
Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Adrian Reber authored
The travis build for s390x started to fail with: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease Unable to find expected entry 'main/binary-s390x/Packages' in Release file (Wrong sources.list entry or malformed file) This changes the repository definition just like it is done for ppc64le. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Send pre-dump notify to 36 Traceback (most recent call last): File "zdtm.py", line 2161, in <module> do_run_test(tinfo[0], tinfo[1], tinfo[2], tinfo[3]) File "zdtm.py", line 1549, in do_run_test cr(cr_api, t, opts) File "zdtm.py", line 1264, in cr test.pre_dump_notify() File "zdtm.py", line 490, in pre_dump_notify fdin.write(struct.pack("i", 0)) TypeError: write() argument 1 must be unicode, not str Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 09 Jul, 2018 30 commits
-
-
Adrian Reber authored
Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
The idea of the test is: 1) mmap separate page and put variable there, so that other usage does not dirty these region. Initialize the variable with VALUE_A. 2) fork a child with special pid == CHILD_NS_PID. Only if it is a first child overwrite the variable with VALUE_B. 3) wait for the end of the next predump or end of restore with test_wait_pre_dump_ack/test_wait_pre_dump pair and kill our child. Note: The memory region is "clean" in parent. 4) goto (2) unles end of cr is reported by test_waitpre So on first iteration child with pid CHILD_NS_PID was dumped with VALUE_B, on all other iterations and on final dump other child with the same pid exists but with VALUE_A. But on all iterations after the first one we have these memory region "clean". So criu before the fix would have restored the VALUE_B taking it from first child's image, but should restore VALUE_A. Note: Child in its turn waits termination and performs a check that variable value doesn't change after c/r. We should run the test with at least one predump to trigger the problem: [root@snorch criu]# ./test/zdtm.py run --pre 1 -k always -t zdtm/transition/pid_reuse Checking feature ns_pid Checking feature ns_get_userns Checking feature ns_get_parent === Run 1/1 ================ zdtm/transition/pid_reuse ===================== Run zdtm/transition/pid_reuse in ns ====================== DEP pid_reuse.d CC pid_reuse.o LINK pid_reuse Start test Test is SUID ./pid_reuse --pidfile=pid_reuse.pid --outfile=pid_reuse.out Run criu pre-dump Send the 10 signal to 52 Run criu dump Run criu restore Send the 15 signal to 73 Wait for zdtm/transition/pid_reuse(73) to die for 0.100000 Test output: ================================ 14:47:57.717: 11235: ERR: pid_reuse.c:76: Wrong value in a variable after restore 14:47:57.717: 4: FAIL: pid_reuse.c:110: Task 11235 exited with wrong code 1 (errno = 11 (Resource temporarily unavailable)) <<< ================================ https://jira.sw.ru/browse/PSBM-67502 v3: simplify waitpid's status check v9: switch to test_wait_pre_dump(_ack) Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
*** CID 190172: Uninitialized variables (UNINIT) /criu/mem.c: 325 in detect_pid_reuse() Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Commit 37e4c7bfc264 fixed arm, ppc, x86 (32bit), while it made wrong definition of x86_64. Fix that. Also, add commentary to raw fork() implementation. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pawel Stradomski authored
Signed-off-by:
Pawel Stradomski <pstradomski@google.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Radostin Stoyanov authored
The `port` option is converted from unsigned short integer to network byte order twice. Unfortunately the 2nd conversion reverses the 1st one. Example: #include <stdio.h> #include <arpa/inet.h> #include <stdlib.h> int main() { printf("%d\n", htons(atoi("1234"))); /* 53764 */ printf("%d\n", htons(htons(atoi("1234")))); /* 1234 */ return 0; } Signed-off-by:
Radostin Stoyanov <rstoyanov1@gmail.com>
-
Mike Rapoport authored
The criu_status_in is not always used and it may be -1 when the signal handler closes it. With lazy-pages we hit a corner case which clobbers the errno value. This happens when we resume the process inside glibc syscall wrapper and get the signal before the page containing errno is copied. In this case, signal handler is invoked before the syscall return value is written to errno and the actual value of errno seen by the process becomes -EBADF because of close(-1) in the signal handler. Let's ensure that close() in signal handler does not fail to make Jenkins happier while the proper solution for the lazy-pages issue is found. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Cc: Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Reported-by:
Dmitry Safonov <dima@arista.com> Cc: Dmitry Safonov <dima@arista.com> Signed-off-by:
Andrei Vagin <avagin@openvz.org> Reviewed-by:
Dmitry Safonov <dima@arista.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Mike Rapoport authored
The kerndat_init() is now called before the jump to action handler. This allows us to directly use kdat without calling to the corresponding kerndat_*() methods. ✓ travis-ci: success for lazy-pages: update checks for availability of userfaultfd (rev3) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Andrei Vagin authored
Call "ps axf" if waitpid() is running more than 10 seconds Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
1) fix sfle memory leak on get_fle_for_scm error 2) fix gfd open descriptor leak on get_fle_for_scm error 3-6) fix buf memory leak on read and pwrite errors Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Cyrill Gorcunov authored
On fedora rawhide seccomp_metadata for some reason is not defined (while in kernel it introduced together with PTRACE_SECCOMP_GET_METADATA). So lets do a trick for a while -- define own alias. Once system headers get settled down we might find more suitable solution. Because it's a part of kernel API we're on the safe side. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
After CR_STATE_RESTORE_SIGCHLD stage triggered we are not allowed to exit, just yield the BUG instead. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
The seccomp_metadata may be already defined in system ptrace.h header on recent kernels so include it. https://github.com/checkpoint-restore/criu/issues/486#event-1628406918Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
*** CID 190178: Null pointer dereferences (NULL_RETURNS) /criu/seccomp.c: 296 in collect_filters() Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Credential commitment affects dumpable and pdeath signals so we have to move their restore after the restore_creds, just like we have in __export_restore_task (ie for group leader). https://jira.sw.ru/browse/PSBM-84198Acked-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Looking up for pid in nesting pidns supposed to be done for non group leaders only, thus __export_restore_thread do this check on its own and we don't have to make a similar lookup especially on group leader where pids in args never were valid. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
- Fix typo in sizeof() operand - Eliminate redundant prctl calls if no PTRACE_SECCOMP_GET_METADATA detected Reported-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Andrew proposed the test which actually triggered the issue in current seccomp series, put it into a regular basis. Suggested-by:
Andrey Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
When considering if we to call PTRACE_O_SUSPEND_SECCOMP on the tid we should take into account if there at least one thread which has seccomp mode enabled, otherwise we might miss filter suspension and restore procedure might break due to own criu syscall get filtered out. Same time we should move seccomp restore for threads to take place after CR_STATE_RESTORE_SIGCHLD state so that main criu code will attach to threads and setup seccomp suspension flag before we start restoring the filters. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
To checkpoint per-thread seccomp filters we need a significant rework of a dumping code. The general idea is the following: - Each thread is tracked by its tid inside global seccomp rbtree thus we can easily add entries there or lookup on demand. - When we collect threads into pstree entries we fetch its seccomp mode from procfs parsing routine and allocate a new entry inside rbtree to remember the seccomp mode. Note at this moment we're not dumping real filters yet (because filter data image is a single one for all consumers) - Once all tids are collected and our tree is complete we call for seccomp_collect_dump_filters helper which walks every pstree entry and iterate over each tid inside thread group calling seccomp_dump_thread, which in turn uses ptrace engine to fetch filters and keep this data in memory. To optimize data usage we figure out if we can use TSYNC flag on restore calling try_use_tsync helper: for TSYNC flag kernel automatically propagate filter to all threads, thus we need to compare all filters inside thread group for identity since there is no other way to figure out if user passed TSYNC flag when been creating filters. - Finally dump_seccomp_filters is called which does real write of seccomp filter data into an image file. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
At now we pretend that all threads are sharing seccomp chains and at checkpoint moment we test seccomp modes to make sure if this assumption is valid refusing to dump otherwise. Still the kernel tacks seccomp filter chains per each thread and now we've faced applications (such as java) where per-thread chains are actively used. Thus we need to bring support of handling filters via per-thread basis. In this a bit intrusive patch the restore engine is lifted up to treat each thread separately. Here what is done: - Image core file is modified to keep seccomp filters inside thread_core_entry. For backward compatibility former seccomp_mode and seccomp_filter members in task_core_entry are renamed to have old_ prefix and on restore we test if we're dealing with old images. Since per-thread dump is not yet implemeneted the dumping procedure continue operating with old_ members. - In pie restorer code memory containing filters are addressed from inside thread_restore_args structure which now contains seccomp mode itself and chain attributes (number of filters and etc). Reading of per-thread data is done in seccomp_prepare_threads helper -- we take one pstree_item and walks over every thread inside to allocate pie memory and pin data there. Because of PIE specific, before jumping into pie code we have to relocate this memory into new place and for this seccomp_rst_reloc is served. In restorer itself we check if thread_restore_args provides us enabled seccomp mode (strict or filter passed) and call for restore_seccomp_filter if needed. - To unify names we start using seccomp_ prefix for all related stuff involved into this change (prepare_seccomp_filters renamed to seccomp_read_image because it only reads image and nothing more, image handler is renamed to seccomp_img_entry instead of too short 'se'. With this change we're now allowed to start collecting and dumping seccomp filters per each thread, which will be done in next patch. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Note that there is no real usage of this flag on restore, we simply save it in image and will make a real use later. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
This header is main place for all seccomp related structures so move seccomp_info here. This will allow to minimize changes area when need to update definitions and such. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
For grepability sake in logs. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
We will use it to figure out if filter log target is used. Metadata associated with seccomp filter is relatively new feature which allows userspace to get and set it back. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
If pre-dump-notify flag is set, zdtm sends a notify to the test after pre-dump was finished and waits for the test to send back a reply that test did all it's work and now is ready for a next pre-dump/dump. How it can be used: while (!test_wait_pre_dump()) { /* Do something after predump */ test_wait_pre_dump_ack(); } /* Do something after restore */ Internally we open two pipes for the test one for receiving notify (with two open ends) and one for replying to it (only write end open). Fds of pipes are dupped to predefined numbers and zdtm opens these fds through /proc/<test-pid>/fd/{100,101} and communicates with the test. v9: switch to two way interface to remove race then operation we try to run after predump may be yet unfinished at the time of next dump. Suggested-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-