• Pavel Emelyanov's avatar
    test, unix: Exhaustive testing of states (v2) · e098b119
    Pavel Emelyanov authored
    By exhaustive testing I understand a test suite that generates as much
    states to try to C/R as possible by trying all the possible sequences
    of system calls. Since such a generation, if done on all the Linux API
    we support in CRIU, would produce bazillions of process, I propose to
    start with something simple.
    
    As a starting point -- unix stream sockets with abstract names that
    can be created and used by a single process :)
    
    The script generates situations in which unix sockets can get into by
    using a pre-defined set of system calls. In this patch the syscalls
    are socket, listen, bind, accept, connect and send. Also the nummber
    of system calls to use (i.e. -- the depth of the tree) is limited by
    the --depth option.
    
    There are three things that can be done with a generated 'state':
    
    I) Generate :) and show
    
    Generation is done by recursively doing everything that is possible
    (and makes sence) in a given state. To reduce the size of the tree
    some meaningless branches are cut, e.g. creating a socket and closing
    it right after that, creating two similar sockets one-by-one and some
    more.
    
    Shown on the screen is a cryptic string, e.g. 'SA-CX-MX_SBL one,
    describing the sockets in the state. This is how it can be decoded:
    
     - sockets are delimited with _
     - first goes type (S -- stream, D --datagram)
     - next goes name state (A -- no name, B with name, X socket is not in
       FD table, i.e. closed or not yet accepted)
     - next may go letter L meaning that the socket is listening
     - -Cx -- socket is connected and x is the peer's name state
     - -Ixyz -- socket has incoming connections queue and xyz are the
       connect()-ors name states
     - -Mxyz -- socket has messages and xyz is senders' name states
    
    The example above means, that we have two sockets:
    
     - SA-CX-MX: stream, with no name, connected to a dead one and with a
       message from a dead one
     - SBL: stream, with name, listening
    
    Next printed is the sequence of system calls to get into it, e.g. this
    is how to get into the state above:
    
    	socket(S) = 1
    	bind(1, $name-1)
    	listen(1)
    	socket(S) = 2
    	connect(2, $name-1)
    	accept(1) = 3
    	send(2, $message-0)
    	send(3, $message-0)
    	close(3)
    
    Program has created a stream socket, bound it, listened it, then
    created another stream socket, connected to the 1st one, then accepted
    the connection sent two messages vice-versa and closed the accepted
    end, so the 1st socket left connected to the dead socket with a
    message from it.
    
    II) Run the state
    
    This is when test actually creates a process that does the syscalls
    required to get into the generated state (and hopefully gets into it).
    
    III) Check C/R of the state
    
    This is the trickiest part when it comes to the R step -- it's not
    clear how to validate that the state restored is correct. But if only
    trying to dump the state -- it's just calling criu dump. As images dir
    the state string description is used.
    
    One may choose only to generate the states with --gen option. One may
    choose only to run the states with --run option. The latter is useful
    to verify that the states generator is actually producing valid
    states. If no options given, the state is also dump-ed (restore is to
    come later).
    
    For now the usage experience is like this:
    
    - Going --depth 10 --gen (i.e. just generating all possibles states
      that are acheivable with 10 syscalls) produces 44 unique states for
      0.01 seconds. The generated result covers some static tests we have
      in zdtm :)  More generation stats is like this:
       --depth 15 : 1.1 sec   / 72 states
       --depth 18 : 13.2 sec  / 89 states
       --depth 20 : 1 m 8 sec / 101 state
    
    - Running and trying with criu is checked with --depth 9. Criu fails
      to dump the state SA-CX-MX_SBL (shown above) with the error
    
      Error (criu/sk-queue.c:151): recvmsg fail: error: Connection reset by peer
    
    Nearest plans:
    
    1. Add generators for on-disk sockets names (now oly abstract).
       Here an interesting case is when names overlap and one socket gets
       a name of another, but isn't accessible by it
    
    2. Add datagram sockets.
       Here it'd be fun to look at how many-to-one connections are
       generated and checked.
    
    3. Add socketpair()-s.
    
    Farther plans:
    
    1. Cut the tree better to allow for deeper tree scan.
    
    2. Add restore.
    
    3. Add SCM-s
    
    4. Have the exhaustive testing for other resources.
    
    Changes since v1:
    
    * Added DGRAM sockets :)
    
      Dgram sockets are trickier that STREAM, as they can reconnect from
      one peer to another. Thus just limiting the tree depth results in
      wierd states when socket just changes peer. In the v1 of this patch
      new sockets were added to the state only when old ones reported that
      there's nothing that can be done with them. This limited the amount
      of stupid branches, but this strategy doesn't work with dgram due to
      reconnect. Due to this, change #2:
    
    * Added the --sockets NR option to limit the amount of sockets.
    
      This allowed to throw new sockets into the state on each step, which
      made a lot of interesting states for DGRAM ones.
    
    * Added the 'restore' stage and checks after it.
    
      After the process is restore the script performs as much checks as
      possible having the expected state description in memory. The checks
      verify that the values below get from real sockets match the
      expectations in generated state:
    
       - socket itself
       - name
       - listen state
       - pending connections
       - messages in queue (sender is not checked)
       - connectivity
    
      The latter is checked last, after all queues should be empty, by
      sending control messages with socket.recv() method.
    
    * Added --keep option to run all tests even if one of them fails.
    
      And print nice summary at the end.
    
    So far the test found several issues:
    
    - Dump doesn't work for half-closed connection with unread messages
    - Pending half-closed connection is not restored
    - Socket name is not restored
    - Message is not restored
    
    New TODO:
    
    - Check listen state is still possible to accept connections (?)
    - Add socketpair()s
    - Add on-disk names
    - Add SCM-s
    - Exhaustive script for other resources
    Signed-off-by: 's avatarPavel Emelyanov <xemul@virtuozzo.com>
    Signed-off-by: 's avatarAndrei Vagin <avagin@virtuozzo.com>
    e098b119
Name
Last commit
Last update
Documentation Loading commit data...
compel Loading commit data...
contrib Loading commit data...
coredump Loading commit data...
crit Loading commit data...
criu Loading commit data...
images Loading commit data...
include/common Loading commit data...
lib Loading commit data...
scripts Loading commit data...
soccr Loading commit data...
test Loading commit data...
.gitignore Loading commit data...
.mailmap Loading commit data...
.travis.yml Loading commit data...
COPYING Loading commit data...
CREDITS Loading commit data...
INSTALL.md Loading commit data...
Makefile Loading commit data...
Makefile.compel Loading commit data...
Makefile.config Loading commit data...
Makefile.install Loading commit data...
Makefile.versions Loading commit data...
README.md Loading commit data...