Commit 2e00d019 authored by Cyrill Gorcunov's avatar Cyrill Gorcunov

docs: Add internals details

Signed-off-by: 's avatarCyrill Gorcunov <gorcunov@gmail.com>
parent 60c9235f
crtools internals
=================
What CRtools is
---------------
In short -- crtools is an utility to checkpoint/restore (CR) processes. Unlike CR
implemented completely in kernel space, it tries to achieve the same goal opreating
in user space.
Since this tool (and overall concept) is under heavily development stage, there are
some known limitations
- Only pure x86-64 environment is supported, no IA32 emulation.
- There is no way to use cgroups freezer facility.
- No network or IPC CR supported.
At moment CR of the following resources are supported
- Process tree
- Files (with some limitations)
- Pipes
- Memory
Basic design
------------
Checkpoint
~~~~~~~~~~
Checkpoint procedure relies on /proc file system (it's a general place
where crtools takes all the information needed). Which includes
- File descriptors (via /proc/$pid/fd and /proc/$pid/fdinfo).
- Pipes parameters.
- Memory maps (via /proc/$pid/maps).
Process dumper (lets call it "dumper") does the following steps during
checkpoint stage
- A $pid of a process group leader is obtained from the command line.
- By using this $pid the dumper walks though /proc/$pid/status and gathers
children $pid's recursively. At the end we will have a complete process tree.
- Then it takes every $pid from a process tree, sends SIGSTOP to the every process
found and performs the following steps on each $pid
- Collects VMA areas by parsing /proc/$pid/maps.
- Seizes a task via relatively new ptrace interface. Seizing a task means to
put it into a special state when the task have no idea if it's being operated
by the ptrace.
- Core parameters of a task (such as registers and friends) are being dumped via
ptrace interface and parsing the /proc/$pid/stat entry.
- The dumper injects a parasite code into a task via ptrace interface. This allows
us to dump pages of a task right from within the task's address space.
An injection procedure is pretty simple one
- The dumper scans executable VMA areas of a task (which were previously collected)
and tests if there a place for a few instructions.
- Then (by ptrace as well) it substitutes an original code with new instructions
and creates a new VMA area inside process address space.
- Finally parasite code get copied into the new VMA and the former code which was
being modified during the parasite bootstrap procedure -- restored.
- Then the dumper flushes contents of a task's pages to a file, and drops out
the parasite code block completely, since we don't need it anymore.
- Once the parasite code removed a task get unseized via ptrace call but remains
stopped still.
- The dumper writes out parameters of opened files and pipes (flushing data on disk
if needed).
- SIGCONT is sent to every task in the process tree (to continue execution).
Restore
~~~~~~~
Restore procedure (aka restorer) proceed by the following steps
- The process tree read from a file.
- To restore the process tree the restorer executes clone(CLONE_CHILD_USEPID)
syscall which creates a process with $pid specified. Note if for some reason
you already have a process with the same $pid up and running, the restoration
procedure will refuse to proceed.
- Files and pipes are restored (ie opened with file descriptors they had at
checkpoint time and positioned exactly as they were before. In case if the pipe
had some data buffered before checkpoint -- data will be sent back to the pipe).
- Restoration of virtual memory (and memory pages) is a bit tricky and implemented
by the following steps
- The restorer analyzes the current VMA map by parsing /proc/$pid/maps file.
- Since we are to create completely new memory map the restorer enumerates
all VMA entries and figures out where is the place (or hole) between VMAs
which could be big enough to hold all code and parameters needed for the
rest of the restore procedure.
- Once such area found the restorer copies own code and data to a new place.
- Then the restorer pass execution there, which in turn does
- Unmaps current active VMAs and maps areas the process had at
the checkpoint time.
- Reads pages contents back to newly mapped memory.
- Prepares rt-sigreturn frame on stack and yields __NR_rt_sigreturn
syscall, so in result the process start execution from the former
IP it had at checkpoint time.
Kernel area
-----------
While CR is implemented in user-space still some help from the Linux kernel
is needed, so the following patches are needed
- New directory /proc/$pid/map_files, which allows the CR to find and restore
anonymous shared memory areas.
- Explicit "Children:" line in /proc/$pid/stat file added. This simplifies code
significantly (and kernel already has this information but simply not yet
exported).
- An ability to call clone() with specified $pid.
- start_data, end_data and a few more members of mm_struct.
- Export added to /proc/$pid/stat.
- Import implemented via new prctl codes.
- An ability to map vDSO at predefined address (implemented via
new prctl code as well).
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment