shmem: use mincore page status bit with anon shared mem dumping
It turned out that anon shmem can have pages with non zero content and with both PME_PRESENT and PME_SWAP bits unset in all its vmas in the whole ps tree. Such case is reproduced in issue #209: 1. Dump ps tree with anon shmem filled using datagen. 2. Restore ps tree. anon shmem content is restored in open_shmem(). fd is created for it and it is unmapped from restorer process. 3. anon shmem vma is mapped in restore_mapping() of pie restorer context. anon shmem content is already initialized to non zero content but restored process doesn't touch its newly mapped vma. 4. Run CRIU dump again. All the pages of anon shmem vmas have PME_PRESENT and PME_SWAP bits unset and we don't put vma pages to dump. So if we filter anon shmem pages using PME_PRESENT and PME_SWAP bits the same way as we do it for anon private mem then we have a bug. PME_PRESENT and PME_SWAP bits work for anon private mem because at least one process would restore content of private anon vma in its own address space thus PME bits will be set and pages will be damped. We can't just stop using PME_PRESENT and PME_SWAP bits and dump all non soft dirty and non zero pfn pages. In this case each 1Gb of mapped and not used anon shmem vma will go to dump. This is too bad. To fix the bug in this patch we use mincore bits to finally understand should we dump page or not. mincore bits show page usage status better because mincore performs deeper checking of internal in-kernel state. PME bits filling is based only on process page table. Using mincore has a drawback. It doesn't work when page is in swap. But it's ok for now because mincore was used before we started using PME bits. Also mincore doesn't break page changes tracking functionality for anon shmem that we have now. This bug can be fixed in another way. For example we can make anon shmem restoration work similar to anon private mem restoration. But this fix looks much harder to implement. Signed-off-by:Eugene Batalov <eabatalov89@gmail.com> Signed-off-by:
Pavel Emelyanov <xemul@virtuozzo.com>
Showing
Please
register
or
sign in
to comment