- 27 Oct, 2018 14 commits
-
-
Rodrigo Bruno authored
criu/image-desc.c | 4 ++-- criu/image.c | 4 ++-- criu/include/image.h | 1 + 3 files changed, 5 insertions(+), 4 deletions(-) In order to prepare for remote snapshots (possible with Image Proxy and Image Cache) the O_FORCE_LOCAL flag is added to force some images not to be remote and stay as local files in the file system. Signed-off-by:
Rodrigo Bruno <rbruno@gsd.inesc-id.pt> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Emelyanov authored
We'll need some docs :) bu the API is criu := MakeCriu() criu.Dump(opts, notify) criu.Restore(opts, notify) criu.PreDump(opts, notify) criu.StartPageServer(opts) where opts is the object from rpc.proto, Go has almost native support for those, so caller should - compile .proto file - export it and golang/protobuf/proto - create and initialize the CriuOpts struct and notify is an interface with callbacks that correspond to criu notification messages. A stupid dump/restore tool in src/test/main.go demonstrates the above. Changes since v1: * Added keep_open mode for pre-dumps. Do use it one needs to call criu.Prepare() right after creation and criu.Cleanup() right after .Dump() * Report resp.cr_errmsg string on request error. Further TODO: - docs - code comments travis-ci: success for libphaul (rev2) Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Emelyanov authored
So, here's the next test that just enumerates all possible states and checks that CRIU C/R-s it well. This time -- pipes. The goal of the test is to load the fd-sharing engine, so pipes are chosen, as they not only generate shared struct files, but also produce 2 descriptors in CRIU's fdesc->open callback which is handled separately. It's implemented slightly differently from the unix test, since we don't want to check sequences of syscalls on objects, we need to check the task to pipe relations in all possible ways. The 'state' is several tasks, several pipes and each generated test includes pipe ends sitting in all possible combinations in the tasks' FDTs. Also note, that states, that seem to be equal to each other, e.g. pipe between tasks A->B and pipe B->A, are really different as CRIU picks the pipe-restorer based in task PIDs. So whether the picked task has read end or write end at his FDT makes a difference on restore. Number of tasks is limited with --tasks option, number of pipes with the --pipes one. Test just runs all -- generates states, makes them and C/R-s them. To check the restored result the /proc/pid/fd/ and /proc/pid/fdinfo/ for all restored tasks is analyzed. Right now CRIU works OK for --tasks 2 --pipes 2 (for more -- didn't check). Kirill, please, check that your patches pass this test. TODO: - Randomize FDs under which tasks see the pipes. Now all tasks if they have some pipe, all see it under the same set of FDs. Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Emelyanov authored
By exhaustive testing I understand a test suite that generates as much states to try to C/R as possible by trying all the possible sequences of system calls. Since such a generation, if done on all the Linux API we support in CRIU, would produce bazillions of process, I propose to start with something simple. As a starting point -- unix stream sockets with abstract names that can be created and used by a single process :) The script generates situations in which unix sockets can get into by using a pre-defined set of system calls. In this patch the syscalls are socket, listen, bind, accept, connect and send. Also the nummber of system calls to use (i.e. -- the depth of the tree) is limited by the --depth option. There are three things that can be done with a generated 'state': I) Generate :) and show Generation is done by recursively doing everything that is possible (and makes sence) in a given state. To reduce the size of the tree some meaningless branches are cut, e.g. creating a socket and closing it right after that, creating two similar sockets one-by-one and some more. Shown on the screen is a cryptic string, e.g. 'SA-CX-MX_SBL one, describing the sockets in the state. This is how it can be decoded: - sockets are delimited with _ - first goes type (S -- stream, D --datagram) - next goes name state (A -- no name, B with name, X socket is not in FD table, i.e. closed or not yet accepted) - next may go letter L meaning that the socket is listening - -Cx -- socket is connected and x is the peer's name state - -Ixyz -- socket has incoming connections queue and xyz are the connect()-ors name states - -Mxyz -- socket has messages and xyz is senders' name states The example above means, that we have two sockets: - SA-CX-MX: stream, with no name, connected to a dead one and with a message from a dead one - SBL: stream, with name, listening Next printed is the sequence of system calls to get into it, e.g. this is how to get into the state above: socket(S) = 1 bind(1, $name-1) listen(1) socket(S) = 2 connect(2, $name-1) accept(1) = 3 send(2, $message-0) send(3, $message-0) close(3) Program has created a stream socket, bound it, listened it, then created another stream socket, connected to the 1st one, then accepted the connection sent two messages vice-versa and closed the accepted end, so the 1st socket left connected to the dead socket with a message from it. II) Run the state This is when test actually creates a process that does the syscalls required to get into the generated state (and hopefully gets into it). III) Check C/R of the state This is the trickiest part when it comes to the R step -- it's not clear how to validate that the state restored is correct. But if only trying to dump the state -- it's just calling criu dump. As images dir the state string description is used. One may choose only to generate the states with --gen option. One may choose only to run the states with --run option. The latter is useful to verify that the states generator is actually producing valid states. If no options given, the state is also dump-ed (restore is to come later). For now the usage experience is like this: - Going --depth 10 --gen (i.e. just generating all possibles states that are acheivable with 10 syscalls) produces 44 unique states for 0.01 seconds. The generated result covers some static tests we have in zdtm :) More generation stats is like this: --depth 15 : 1.1 sec / 72 states --depth 18 : 13.2 sec / 89 states --depth 20 : 1 m 8 sec / 101 state - Running and trying with criu is checked with --depth 9. Criu fails to dump the state SA-CX-MX_SBL (shown above) with the error Error (criu/sk-queue.c:151): recvmsg fail: error: Connection reset by peer Nearest plans: 1. Add generators for on-disk sockets names (now oly abstract). Here an interesting case is when names overlap and one socket gets a name of another, but isn't accessible by it 2. Add datagram sockets. Here it'd be fun to look at how many-to-one connections are generated and checked. 3. Add socketpair()-s. Farther plans: 1. Cut the tree better to allow for deeper tree scan. 2. Add restore. 3. Add SCM-s 4. Have the exhaustive testing for other resources. Changes since v1: * Added DGRAM sockets :) Dgram sockets are trickier that STREAM, as they can reconnect from one peer to another. Thus just limiting the tree depth results in wierd states when socket just changes peer. In the v1 of this patch new sockets were added to the state only when old ones reported that there's nothing that can be done with them. This limited the amount of stupid branches, but this strategy doesn't work with dgram due to reconnect. Due to this, change #2: * Added the --sockets NR option to limit the amount of sockets. This allowed to throw new sockets into the state on each step, which made a lot of interesting states for DGRAM ones. * Added the 'restore' stage and checks after it. After the process is restore the script performs as much checks as possible having the expected state description in memory. The checks verify that the values below get from real sockets match the expectations in generated state: - socket itself - name - listen state - pending connections - messages in queue (sender is not checked) - connectivity The latter is checked last, after all queues should be empty, by sending control messages with socket.recv() method. * Added --keep option to run all tests even if one of them fails. And print nice summary at the end. So far the test found several issues: - Dump doesn't work for half-closed connection with unread messages - Pending half-closed connection is not restored - Socket name is not restored - Message is not restored New TODO: - Check listen state is still possible to accept connections (?) - Add socketpair()s - Add on-disk names - Add SCM-s - Exhaustive script for other resources Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Andrew reported that previously he been able to c/r even on the machine with xsavec enabled, so allow to process for now. P.S.I'm investigating the problem and to not block testing process lets permit passing with xsaves bit present. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
For easier understanding what is failed. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
With new cpu-cap='op=noxsaves' mode on x86 we should use compel's instance of rt info since only it carries features masked. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Currently even if kernel supports compact xsave frame a user can disable it by passing noxsaves argument as a boot option. Thus cpuid instruction will report its presence but in real it gonna be masked from kernel pov. Lets do the same and allow a user to mask it via --cpu-cap=noxsaves option (valid for x86 only). Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Will need them to mask some of the features from command line options. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
We don't have yet support of compacted xsave frames so report error on cpu-check, checkpoint, restore actions. Basically it is done in cpu_init routine which is called in the sites we're interested in. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
For debug sake. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Tracking cpuid features is easier when sync'ed with kernel source code. Note though that while in kernel feature bits are not part of ABI, we're saving bits into an image so as result make sure they are posted in proper place together with keeping in mind the backward compatibility issue. Here we also start using v2 of cpuinfo image with more feature bits. Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 17 Jul, 2018 1 commit
-
-
Andrei Vagin authored
A set of images from criu dump can be used as a previous point, when we are doing snapshots. In this case, each point contains a full set of images. https://github.com/checkpoint-restore/criu/issues/479 v2: return -1 if invertory_save_uptime failed Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com> Reviewed-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 10 Jul, 2018 3 commits
-
-
Andrei Vagin authored
Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Adrian Reber authored
The travis build for s390x started to fail with: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease Unable to find expected entry 'main/binary-s390x/Packages' in Release file (Wrong sources.list entry or malformed file) This changes the repository definition just like it is done for ppc64le. Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Send pre-dump notify to 36 Traceback (most recent call last): File "zdtm.py", line 2161, in <module> do_run_test(tinfo[0], tinfo[1], tinfo[2], tinfo[3]) File "zdtm.py", line 1549, in do_run_test cr(cr_api, t, opts) File "zdtm.py", line 1264, in cr test.pre_dump_notify() File "zdtm.py", line 490, in pre_dump_notify fdin.write(struct.pack("i", 0)) TypeError: write() argument 1 must be unicode, not str Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
- 09 Jul, 2018 22 commits
-
-
Adrian Reber authored
Signed-off-by:
Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
The idea of the test is: 1) mmap separate page and put variable there, so that other usage does not dirty these region. Initialize the variable with VALUE_A. 2) fork a child with special pid == CHILD_NS_PID. Only if it is a first child overwrite the variable with VALUE_B. 3) wait for the end of the next predump or end of restore with test_wait_pre_dump_ack/test_wait_pre_dump pair and kill our child. Note: The memory region is "clean" in parent. 4) goto (2) unles end of cr is reported by test_waitpre So on first iteration child with pid CHILD_NS_PID was dumped with VALUE_B, on all other iterations and on final dump other child with the same pid exists but with VALUE_A. But on all iterations after the first one we have these memory region "clean". So criu before the fix would have restored the VALUE_B taking it from first child's image, but should restore VALUE_A. Note: Child in its turn waits termination and performs a check that variable value doesn't change after c/r. We should run the test with at least one predump to trigger the problem: [root@snorch criu]# ./test/zdtm.py run --pre 1 -k always -t zdtm/transition/pid_reuse Checking feature ns_pid Checking feature ns_get_userns Checking feature ns_get_parent === Run 1/1 ================ zdtm/transition/pid_reuse ===================== Run zdtm/transition/pid_reuse in ns ====================== DEP pid_reuse.d CC pid_reuse.o LINK pid_reuse Start test Test is SUID ./pid_reuse --pidfile=pid_reuse.pid --outfile=pid_reuse.out Run criu pre-dump Send the 10 signal to 52 Run criu dump Run criu restore Send the 15 signal to 73 Wait for zdtm/transition/pid_reuse(73) to die for 0.100000 Test output: ================================ 14:47:57.717: 11235: ERR: pid_reuse.c:76: Wrong value in a variable after restore 14:47:57.717: 4: FAIL: pid_reuse.c:110: Task 11235 exited with wrong code 1 (errno = 11 (Resource temporarily unavailable)) <<< ================================ https://jira.sw.ru/browse/PSBM-67502 v3: simplify waitpid's status check v9: switch to test_wait_pre_dump(_ack) Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
*** CID 190172: Uninitialized variables (UNINIT) /criu/mem.c: 325 in detect_pid_reuse() Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Kirill Tkhai authored
Commit 37e4c7bfc264 fixed arm, ppc, x86 (32bit), while it made wrong definition of x86_64. Fix that. Also, add commentary to raw fork() implementation. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pawel Stradomski authored
Signed-off-by:
Pawel Stradomski <pstradomski@google.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Radostin Stoyanov authored
The `port` option is converted from unsigned short integer to network byte order twice. Unfortunately the 2nd conversion reverses the 1st one. Example: #include <stdio.h> #include <arpa/inet.h> #include <stdlib.h> int main() { printf("%d\n", htons(atoi("1234"))); /* 53764 */ printf("%d\n", htons(htons(atoi("1234")))); /* 1234 */ return 0; } Signed-off-by:
Radostin Stoyanov <rstoyanov1@gmail.com>
-
Mike Rapoport authored
The criu_status_in is not always used and it may be -1 when the signal handler closes it. With lazy-pages we hit a corner case which clobbers the errno value. This happens when we resume the process inside glibc syscall wrapper and get the signal before the page containing errno is copied. In this case, signal handler is invoked before the syscall return value is written to errno and the actual value of errno seen by the process becomes -EBADF because of close(-1) in the signal handler. Let's ensure that close() in signal handler does not fail to make Jenkins happier while the proper solution for the lazy-pages issue is found. Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Cc: Adrian Reber <areber@redhat.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Andrei Vagin authored
Reported-by:
Dmitry Safonov <dima@arista.com> Cc: Dmitry Safonov <dima@arista.com> Signed-off-by:
Andrei Vagin <avagin@openvz.org> Reviewed-by:
Dmitry Safonov <dima@arista.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Mike Rapoport authored
The kerndat_init() is now called before the jump to action handler. This allows us to directly use kdat without calling to the corresponding kerndat_*() methods. ✓ travis-ci: success for lazy-pages: update checks for availability of userfaultfd (rev3) Signed-off-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
-
Andrei Vagin authored
Call "ps axf" if waitpid() is running more than 10 seconds Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Pavel Tikhomirov authored
1) fix sfle memory leak on get_fle_for_scm error 2) fix gfd open descriptor leak on get_fle_for_scm error 3-6) fix buf memory leak on read and pwrite errors Signed-off-by:
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
-
Cyrill Gorcunov authored
On fedora rawhide seccomp_metadata for some reason is not defined (while in kernel it introduced together with PTRACE_SECCOMP_GET_METADATA). So lets do a trick for a while -- define own alias. Once system headers get settled down we might find more suitable solution. Because it's a part of kernel API we're on the safe side. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
After CR_STATE_RESTORE_SIGCHLD stage triggered we are not allowed to exit, just yield the BUG instead. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
The seccomp_metadata may be already defined in system ptrace.h header on recent kernels so include it. https://github.com/checkpoint-restore/criu/issues/486#event-1628406918Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
*** CID 190178: Null pointer dereferences (NULL_RETURNS) /criu/seccomp.c: 296 in collect_filters() Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Credential commitment affects dumpable and pdeath signals so we have to move their restore after the restore_creds, just like we have in __export_restore_task (ie for group leader). https://jira.sw.ru/browse/PSBM-84198Acked-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Looking up for pid in nesting pidns supposed to be done for non group leaders only, thus __export_restore_thread do this check on its own and we don't have to make a similar lookup especially on group leader where pids in args never were valid. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
- Fix typo in sizeof() operand - Eliminate redundant prctl calls if no PTRACE_SECCOMP_GET_METADATA detected Reported-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
Andrew proposed the test which actually triggered the issue in current seccomp series, put it into a regular basis. Suggested-by:
Andrey Vagin <avagin@virtuozzo.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-
Cyrill Gorcunov authored
When considering if we to call PTRACE_O_SUSPEND_SECCOMP on the tid we should take into account if there at least one thread which has seccomp mode enabled, otherwise we might miss filter suspension and restore procedure might break due to own criu syscall get filtered out. Same time we should move seccomp restore for threads to take place after CR_STATE_RESTORE_SIGCHLD state so that main criu code will attach to threads and setup seccomp suspension flag before we start restoring the filters. Reported-by:
Andrei Vagin <avagin@virtuozzo.com> Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrei Vagin <avagin@virtuozzo.com>
-