• Alberto Garcia's avatar
    blockjob: Make block_job_pause_all() keep a reference to the jobs · 3d5d319e
    Alberto Garcia authored
    Starting from commit 40840e41 we are
    pausing all block jobs during bdrv_reopen_multiple() to prevent any of
    them from finishing and removing nodes from the graph while they are
    being reopened.
    
    It turns out that pausing a block job doesn't necessarily prevent it
    from finishing: a paused block job can still run its exit function
    from the main loop and call block_job_completed(). The mirror block
    job in particular always goes to the main loop while it is paused (by
    virtue of the bdrv_drained_begin() call in mirror_run()).
    
    Destroying a paused block job during bdrv_reopen_multiple() has two
    consequences:
    
       1) The references to the nodes involved in the job are released,
          possibly destroying some of them. If those nodes were in the
          reopen queue this would trigger the problem originally described
          in commit 40840e41, crashing QEMU.
    
       2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
          not be doing all necessary bdrv_parent_drained_end() calls.
    
    I can reproduce problem 1) easily with iotest 030 by increasing
    STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
    the iotest like in this example:
    
       https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
    
    
    
    This patch keeps an additional reference to all block jobs between
    block_job_pause_all() and block_job_resume_all(), guaranteeing that
    they are kept alive.
    
    Signed-off-by: default avatarAlberto Garcia <berto@igalia.com>
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    3d5d319e