- 01 Oct, 2014 5 commits
-
-
Pavel Emelyanov authored
The setns() syscall (called by switch_ns()) can be extremely slow. If we call it two or more times from the same task the kernel will synchonously go on a very slow routine called synchronize_rcu() trying to put a reference on old namespaces. To avoid doing this more than once I propose to create all per-ns sockets in one place with one setns call. In this patch there's on nl diag socket used to collect other sockets is created this way. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
Right now we don't support multiple net namespaces, but some day we will. Other than this we have a logic to distinguish cases with no namespaces vs one namespace, so this walking already makes sence. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
And move sockets collection there. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
We have a use-after-free in predump code: 1st the free_pstree() is called in pre_dump_tasks(), then we go to irmap_predump_run() which may call the lookup_irmap() which, in turn, dereferences the root_item to get the root mount ns fd. But the problem is bigger than that. After we've released the tasks (done before freeing pstree on predump) we can no longer access them by PIDs, so keeping the root-item after irmap scan is not a fix. Fix is to get the root fd before releasing the tasks and using one in irmap scanner. Caught recently on iterative inotify_irmap test. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Andrew Vagin <avagin@parallels.com>
-
- 30 Sep, 2014 21 commits
-
-
Andrey Vagin authored
man 2 open: """ mode specifies the permissions to use in case a new file is cre‐ ated. This argument must be supplied when O_CREAT or O_TMPFILE is specified in flags; """ Cc: Konstantin Neumoin <kneumoin@parallels.com> Signed-off-by:
Andrey Vagin <avagin@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Andrey Vagin authored
Reported-by:
Konstantin Neumoin <kneumoin@parallels.com> Cc: Konstantin Neumoin <kneumoin@parallels.com> Signed-off-by:
Andrey Vagin <avagin@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Andrey Vagin authored
Travis CI is configured by adding a file named .travis.yml, which is a YAML format text file, to the root directory of the GitHub repository.[5] Travis CI automatically detects when a commit has been made and pushed to a GitHub repository that is using Travis CI, and each time this happens, it will try to build the project and run tests. """ https://en.wikipedia.org/wiki/Travis_CI Currently Travis CI builds criu for x86_64 and ARM v2: move travis-ci.sh in scripts v3: fix path to the script in the script Signed-off-by:
Andrey Vagin <avagin@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
The restore times look like Before patch: futex: 370 3.554482 (84.2%) umount: 41 0.234796 (5.6%) read: 4737 0.113987 (2.7%) recvmsg: 43 0.100083 (2.4%) wait4: 10 0.033344 (0.8%) After patch: futex: 187 1.547642 (72.9%) umount: 41 0.234595 (11.0%) recvmsg: 43 0.075738 (3.6%) flock: 42 0.038696 (1.8%) clone: 35 0.037699 (1.8%) Most of the time we wait for other processes to restore, but that's OK (would only affect parallel restore). And we see that read-s really go away (onto 7th position). Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
Dump times (top-5) look like Before patch: writev: 1595 0.048337 (15.1%) openat: 1326 0.041976 (13.1%) close: 1434 0.034661 (10.8%) read: 988 0.028760 (9.0%) wait4: 170 0.028271 (8.8%) After patch: openat: 1326 0.040010 (16.4%) close: 1434 0.030039 (12.3%) read: 988 0.025827 (10.6%) wait4: 170 0.025549 (10.5%) ptrace: 834 0.021624 (8.9%) So write-s go away from top list (turn into 8th position). Funny thing is that all object writes get merged with the magic writes, so the total amount of write()-s (not writev-s) in the strace remain intact :) Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
For reads and writes the names pos and bleft will have strange meaning, so rename them into smth more appropriate. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
We have some images that store raw data together with the pb objects (and one that just stores raw data) and use custom access to this. E.g. pipe-data images splice data into them and sk-queue one lseeks the image for queue packets. For those using buffered mode mixed with raw may lead to troubles. Explicitly mark such images, so that the buffering (next patches) handle such images carefully. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
The pb_(read|write)-s will stop using plan fd soon. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
We want to have buffered images to speed up dump and, slightly, restore. Right now we use plan file descriptors to write and read images to/from. Making them buffered cannot be gracefully done on plain fds, so introduce a new class. This will also help if (when?) we will want to do more complex changes with images, e.g. store them all in one file or send them directly to the network. For now the cr_img just contains one int _fd variable. This patch chages the prototype of open_image() to return struct cr_img *, pb_(read|write)* to accept one and fixes the compilation of the rest of the code :) Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
Ugly, but it's for easier further patching. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
The same -- int-fd will soon go away, so return the explicit int -1 instead of it. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
There will be no int-fd soon, so one more preparation to this fact. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
Since we're going to switch from int-fd-s to class-image soon the fdset name will not fit into the new terminology. This patch is sed -e 's/fdset/imgset/g' -i * sed -e 's/imgset_fd/img_from_set/g' -i * git mv include/fdset.h include/imgset.h Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
This is to simplify the change from int fd to more generic image class data-type. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
The write_img_buf will be used only for images writing, while in this place we just have a raw file descriptor. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Pavel Emelyanov authored
We drop the O_OPT from flags and will drop one more. So instead of a set of bools let's have the flags copy at hands. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org>
-
Cyrill Gorcunov authored
We've a special helper xrealloc_safe for reallocs. Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Cyrill Gorcunov authored
We prefer x* helpers because they print error in case of allocation failures. Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Cyrill Gorcunov authored
While been converting reading of data stream to bfd the @buf member was left untouched leading to incorrect data to be read, fix it setting up proper one, ie @str itself, otherwise dumping of timerfd files are failing. Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
- 29 Sep, 2014 8 commits
-
-
Pavel Emelyanov authored
I plan to re-use the bfd engine for images buffering. Right now this engine uses one buffer that gets reused by all bfdopen()-s. This works for current usage (one-by-pne proc files access), but for images we'll need more buffers. So this patch just puts buffers in a list and organizes a stupid R-R with refill on it. v2: Check for buffer allocation errors Print buffer mem pointer in debug Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Andrew Vagin <avagin@parallels.com>
-
Pavel Emelyanov authored
The open_pid_proc engine knows itself how to cache per-pid descriptors. No need in closing it by hands. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
When dumping tasks we do a lot of open_proc()-s and to speed this up the /proc/pid directory is opened first and the fd is kept cached. So next open_proc()-s do just openat(cached_fd, name). The thing is that we sometimes call open_proc(PROC_SELF) in between and proc helpers cache the /proc/self too. As the result we have a bunch of open(/proc/pid) close() open(/proc/self) close() see-saw-s in the middle of dumping tasks. To fix this we may cache the /proc/self separately from the /proc/pid descriptor. This eliminates quite a lot of pointless open-s and close-s. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
We have a bug. If someone opens proc with open_pid_proc or alike with PROC_SELF of real PID before going to restore fds, then the fd cached by proc helpers would be cached in fd 0 (we close all fds beforehead) and it may clash with restored fds. We don't hit this right now simply due to being too lucky -- we call open_proc(PROC_GEN) on "locks" which first closes the cached the per-pid descriptor and then reports back just the /proc one which sits in service area. But once we change this (next patch) things would get broken. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
We have a, well, issue with how we calculate the vma's mnt_id. Right now get one via criu side file descriptor that it got by opening the /proc/pid/map_files/ link. The problem is that these descriptors are 'merged' or 'borrowed' by adjacent vmas from previous ones. Thus, getting the mnt_id value for each of them makes no sense -- these files are the same. So move this mnt_id getting earlier into vma parsing code. This brings a potential problem -- if we have two adjacent vmas mapping the same inode (dev:ino pair) but living in different mount namespaces -- this check would produce wrong result. "Wrong" from the perspective that on restore correct file would be opened from wrong namespace. I propose to live with it, since this is not worse than the --evasive-devices option, it's _very_ unlikely, but saves a lot of openeings. Note, that in case app switched mount namespace and then mapped some new library (with dlopen) things would work correctly -- new vmas will likely be not adjacent and for different dev:ino. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
We have non-obvious handling of vm_file_fd/vm_socket_id pair and the vma->file_borrowed. Comment these to in the structure. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
We have some fields, that are dump-only and some that are restore only (quite a lot of them actually). Reshuffle them on the vma_area to explicitly show which one is which. And rename some of them for easier grep. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
- 24 Sep, 2014 2 commits
-
-
Andrey Vagin authored
Reported-by: Mr Jenkins Signed-off-by:
Andrey Vagin <avagin@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Andrey Vagin authored
Currently this optimization skips unscanned data and doesn't work. Lets skip scanned data only. Reported-by: Jenkins Signed-off-by:
Andrey Vagin <avagin@openvz.org> Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
- 23 Sep, 2014 4 commits
-
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-
Pavel Emelyanov authored
This sounds strange, but we kinda need one. Here's the justification for that. We heavily open /proc/pid/foo files. To speed things up we do pid_dir = open("/proc/pid") then openat(pid_dir, foo). This really saves time on big trees, up to 10%. Sometimes we need line-by-line scan of these files, and for that we currently use the fdopen() call. It takes a file descriptor (obtained with openat from above) and wraps one into a FILE*. The problem with the latter is that fdopen _always_ mmap()s a buffer for reads and this buffer always (!) gets unmapped back on fclose(). This pair of mmap() + munmap() eats time on big trees, up to 10% in my experiments with p.haul tests. The situation is made even worse by the fact that each fgets on the file results in a new page allocated in the kernel (since the mapping is new). And also this fgets copies data, which is not big deal, but for e.g. smaps file this results in ~8K bytes being just copied around. Having said that, here's a small but fast way of reading a descriptor line-by-line using big buffer for reducing the amount of read()s. After all per-task fopen_proc()-s get reworked on this engine (next 4 patches) the results on p.haul test would be Syscall Calls Time (% of time) Now: mmap: 463 0.012033 (3.2%) munmap: 447 0.014473 (3.9%) Patched: munmap: 57 0.002106 (0.6%) mmap: 74 0.002286 (0.7%) The amount of read()s and open()s doesn't change since FILE* also uses page-sized buffer for reading. Also this eliminates some amount of lseek()s and fstat()s the fdopen() does every time to catch up with file position and to determine what sort of buffering it should use (for terminals it's \n-driven, for files it's not). Signed-off-by:
Pavel Emelyanov <xemul@parallels.com>
-