Skip to content

Conversation

@bchalios
Copy link
Contributor

When saving the state of a microVM with one or more block devices backed by the async IO engine, we need to take a few steps extra steps before serializing the state to the disk, as we need to make sure that there aren't any pending io_uring requests that have not been handled by the kernel yet. For these types of devices that need that we call a prepare_save() hook before serializing the device state.

If there are indeed pending requests, once we handle them we need to let the guest know, by adding the corresponding VirtIO descriptors to the used ring. Moreover, since we use notification suppression, this might or might not require us to send an interrupt to the guest.

Now, when we save the state of a VirtIO device, we save the device specific state and the transport (MMIO or PCI) state along with it.

There were a few issues with how we were doing the serialization:

  1. We were saving the transport state before we run the prepare_save() hook. The transport state includes information such as the interrupt_status in MMIO or MSI-X config in PCI. prepare_save() in the case of async IO might change this state, so us running it after saving the transport state essentially looses information.
  2. We were saving the devices states after saving the KVM state. This is problematic because, if prepare_save() sends an interrupt to the guest we don't save that "pending interrupt" bit of information in the snapshot.

These two issues, were making microVMs with block devices backed by async IO freeze in some cases post snapshot resume, since the guest is stuck in the kernel waiting for some notification for the device emulation which never arrives.

Currently, this is only a problem with virtio-block with async IO engine. The only other device using the prepare_save() hook is currently virtio-net, but this one doesn't modify any VirtIO state, neither sends interrupts.

Fix this by ensuring the correct ordering of operations during the snapshot phase.

Fixes #5554

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link

codecov bot commented Dec 15, 2025

Codecov Report

❌ Patch coverage is 60.86957% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.23%. Comparing base (ff0e866) to head (27c7be6).

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/net/device.rs 33.33% 8 Missing ⚠️
src/vmm/src/devices/virtio/block/device.rs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5582      +/-   ##
==========================================
- Coverage   83.23%   83.23%   -0.01%     
==========================================
  Files         277      277              
  Lines       29263    29262       -1     
==========================================
- Hits        24358    24357       -1     
  Misses       4905     4905              
Flag Coverage Δ
5.10-m5n.metal 83.57% <60.86%> (-0.01%) ⬇️
5.10-m6a.metal 82.91% <60.86%> (-0.01%) ⬇️
5.10-m6g.metal 80.18% <60.86%> (-0.01%) ⬇️
5.10-m6i.metal 83.57% <60.86%> (-0.01%) ⬇️
5.10-m7a.metal-48xl 82.90% <60.86%> (-0.01%) ⬇️
5.10-m7g.metal 80.19% <60.86%> (-0.01%) ⬇️
5.10-m7i.metal-24xl 83.54% <60.86%> (-0.01%) ⬇️
5.10-m7i.metal-48xl 83.54% <60.86%> (-0.01%) ⬇️
5.10-m8g.metal-24xl 80.18% <60.86%> (-0.01%) ⬇️
5.10-m8g.metal-48xl 80.18% <60.86%> (-0.01%) ⬇️
6.1-m5n.metal 83.60% <60.86%> (-0.01%) ⬇️
6.1-m6a.metal 82.94% <60.86%> (-0.01%) ⬇️
6.1-m6g.metal 80.18% <60.86%> (-0.01%) ⬇️
6.1-m6i.metal 83.60% <60.86%> (-0.01%) ⬇️
6.1-m7a.metal-48xl 82.93% <60.86%> (-0.01%) ⬇️
6.1-m7g.metal 80.19% <60.86%> (+<0.01%) ⬆️
6.1-m7i.metal-24xl 83.61% <60.86%> (-0.01%) ⬇️
6.1-m7i.metal-48xl 83.61% <60.86%> (-0.01%) ⬇️
6.1-m8g.metal-24xl 80.18% <60.86%> (-0.01%) ⬇️
6.1-m8g.metal-48xl 80.18% <60.86%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

When saving the state of a microVM with one or more block devices backed
by the async IO engine, we need to take a few steps extra steps before
serializing the state to the disk, as we need to make sure that there
aren't any pending io_uring requests that have not been handled by the
kernel yet. For these types of devices that need that we call a
prepare_save() hook before serializing the device state.

If there are indeed pending requests, once we handle them we need to let
the guest know, by adding the corresponding VirtIO descriptors to the
used ring. Moreover, since we use notification suppression, this might
or might not require us to send an interrupt to the guest.

Now, when we save the state of a VirtIO device, we save the device
specific state **and** the transport (MMIO or PCI) state along with it.

There were a few issues with how we were doing the serialization:

1. We were saving the transport state before we run the prepare_save()
   hook. The transport state includes information such as the
   `interrupt_status` in MMIO or `MSI-X config` in PCI. prepare_save()
   in the case of async IO might change this state, so us running it
   after saving the transport state essentially looses information.
2. We were saving the devices states after saving the KVM state. This is
   problematic because, if prepare_save() sends an interrupt to the
   guest we don't save that "pending interrupt" bit of information in
   the snapshot.

These two issues, were making microVMs with block devices backed by
async IO freeze in some cases post snapshot resume, since the guest is
stuck in the kernel waiting for some notification for the device
emulation which never arrives.

Currently, this is only a problem with virtio-block with async IO
engine. The only other device using the prepare_save() hook is currently
virtio-net, but this one doesn't modify any VirtIO state, neither sends
interrupts.

Fix this by ensuring the correct ordering of operations during the
snapshot phase.

Signed-off-by: Babis Chalios <[email protected]>
@bchalios bchalios force-pushed the fix_async_io_save_restore branch from 3f71ea3 to 27c7be6 Compare December 15, 2025 16:01
@pkit
Copy link

pkit commented Dec 16, 2025

Maybe it makes sense to add the tests from my PR?
Although they can flake, so maybe not the best idea. On the other hand it would "protect" from accidentally messing up the order in some later PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] When using Async IO Engine pending ops cause resume to freeze

2 participants