Skip to content

Conversation

@sseago
Copy link
Contributor

@sseago sseago commented Dec 2, 2025

Why the changes were made

This PR adds a design document for the VM qcow2 incremental backup feature.

How to test the changes made

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Adds a new design document that defines an incremental qcow2 datamover for Virt integration with OADP/Velero, covering goals, use cases, high-level architecture, BackupItemAction/RestoreItemAction behavior, a Virt Backup Controller, DataUpload/DataDownload semantics, storage layout, and open questions.

Changes

Cohort / File(s) Summary
Design Documentation
docs/design/virt-qcow2-datamover.md
New design doc describing an incremental qcow2 datamover: background, goals/non-goals, use cases; required Velero changes (treat unrecognized volume policy actions as skip; SnapshotType in data upload/download); BIA/RIA plugin behavior for VirtualMachineInstance (CBT checks, VM running state, status updates, per-VM/PVC policy evaluation); temporary PVC handling and DataUpload creation (SnapshotType "qcow2", DataMover "cnv"), operation tracking and annotations; Virt Backup Controller reconcilers for DataUpload/DataDownload with datamover pod orchestration and interactions with VirtualMachineBackup/VirtualMachineBackupTracker; storage layout for qcow2 files and metadata; open questions (PVC sizing/deletion semantics, snapshot mappings, force-full backup triggers) and notes on CBT, datamover pluggability, and push vs pull modes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus:
    • Backup/restore sequencing and state transitions between BIA/RIA, DataUpload/DataDownload, and Virt Backup Controller.
    • Use and compatibility of SnapshotType and DataMover annotations with Velero/OADP expectations.
    • Open questions affecting implementation: PVC sizing/deletion, snapshot mapping, and force-full backup behavior.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Dec 2, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sseago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
docs/design/virt-qcow2-datamover.md (3)

1-25: Fix markdown formatting issues and correct spelling in intro section.

Line 13 contains a bare URL; wrap it in angle brackets or use markdown link syntax. Lines 20–22 have incorrect list indentation (4 spaces instead of 2). Line 16 has a spelling error: "mantainability" should be "maintainability".

Apply these fixes:

- Implementation based on kubevirt enhancement defined at https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md
+ Implementation based on kubevirt enhancement defined at <https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md>

- Deep integration with velero data mover pluggability (this could be considered in the long-term though, which would minimize duplication of effort and enhance mantainability)
+ Deep integration with velero data mover pluggability (this could be considered in the long-term though, which would minimize duplication of effort and enhance maintainability)

- As a user I want to use OADP to trigger backups that will back up volume data using libvirt tooling rather than CSI snapshots
-    - Volume backups will be incremental when possible (first backup for a given volume will be full, subsequent backups for that same volume will be incremental)
-    - Based on the assumption that libvirt incremental volume backups should be much faster than CSI snapshots followed by incremental kopia snapshot copy, the expectation is that users might run libvirt-based OADP backups more frequently than they would for CSI-based backups.
-    - If users are backing up more frequently with this method, they should ensure that they are not using CSI snapshots or fs-backup via the resource/volume policy configuration in the backup.
+ As a user I want to use OADP to trigger backups that will back up volume data using libvirt tooling rather than CSI snapshots
+  - Volume backups will be incremental when possible (first backup for a given volume will be full, subsequent backups for that same volume will be incremental)
+  - Assuming libvirt incremental backups are faster than CSI snapshots with incremental kopia, users may run these backups more frequently.
+  - If users are backing up more frequently with this method, they should ensure that they are not using CSI snapshots or fs-backup via the resource/volume policy configuration in the backup.

82-98: Correct spelling errors and hyphenation in storage strategy section.

Line 86 should use a hyphen: "top level" → "top-level" (compound adjective). Line 96 has a spelling error: "invidivual" should be "individual". Consider also tightening line 96–97 for clarity by replacing "I'm not sure we get any real benefits" with a more direct phrasing, and simplifying "on top of that" to "additionally" or "furthermore".

Apply these fixes:

- Create a top level dir in the BSL (under the BSL prefix, parallel to backups/restores/kopia) for libvirt datamover.
+ Create a top-level dir in the BSL (under the BSL prefix, parallel to backups/restores/kopia) for libvirt datamover.

- We could use kopia on top of the object storage API, but I'm not sure we get any real benefits, since we're already working with files that represent just the data diff we need. We can just manage them as invidivual objects
+ We could use kopia on top of the object storage API. However, since we're already working with files representing just the data diff needed, we can manage them as individual objects

- This will also require additional overhead around kopia maintenance, etc. and on top of that, we still may need to manage qcow2 file deletion manually.
+ This will also require additional overhead around kopia maintenance, and additionally, we still may need to manage qcow2 file deletion manually.

99-113: Correct spelling error in force full backup discussion.

Line 108 has a spelling error: "datmover" should be "datamover". Also apply consistent 2-space list indentation throughout lines 101–104.

Apply this fix:

- For initial release, we can add a force-full-virt-backup annotation on the velero backup. Longer-term, we can push for a general datamover feature in velero which could force full backups for both fs-backup and velero datamover if backup.spec.forceFullVolumeBackup is true, and once implemented, the qcow2 datmover can use this as well.
+ For initial release, we can add a force-full-virt-backup annotation on the velero backup. Longer-term, we can push for a general datamover feature in velero which could force full backups for both fs-backup and velero datamover if backup.spec.forceFullVolumeBackup is true, and once implemented, the qcow2 datamover can use this as well.

For the Open Questions section, ensure nested lists use 2-space indentation:

- ### Open questions
- - How to determine PVC size?
-     - user-configurable? configmap or annotation?
-     - From the kubevirt enhancement: "Before the process begins, an estimation of the required backup size will be performed. If the provided PVC size is insufficient, an error will be returned"
-     - If the PVC is too small, we need a clear error on the backup indicating that it failed due to insufficient PVC space.
-     - If controller is responsible for PVC creation rather than plugin, then the controller may be able to respond to PVC too small errors by retrying with a larger PVC.
+ ### Open questions
+ - How to determine PVC size?
+   - user-configurable? configmap or annotation?
+   - From the kubevirt enhancement: "Before the process begins, an estimation of the required backup size will be performed. If the provided PVC size is insufficient, an error will be returned"
+   - If the PVC is too small, we need a clear error on the backup indicating that it failed due to insufficient PVC space.
+   - If controller is responsible for PVC creation rather than plugin, then the controller may be able to respond to PVC too small errors by retrying with a larger PVC.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 02a9165 and 43551b2.

📒 Files selected for processing (1)
  • docs/design/virt-qcow2-datamover.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • docs/design/virt-qcow2-datamover.md
🪛 LanguageTool
docs/design/virt-qcow2-datamover.md

[style] ~21-~21: ‘Based on the assumption that’ might be wordy. Consider a shorter alternative.
Context: ... same volume will be incremental) - Based on the assumption that libvirt incremental volume backups shou...

(EN_WORDINESS_PREMIUM_BASED_ON_THE_ASSUMPTION_THAT)


[grammar] ~31-~31: Ensure spelling is correct
Context: ...ckupItemAction/RestoreItemAction plugins - VitualMachineInstance BIA plugin - This plugin will check...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~41-~41: Ensure spelling is correct
Context: ...ill be added to the PVC identifying the VMInstance we're backing up. - This PVC probab...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~75-~75: Ensure spelling is correct
Context: ...ackup and incremental files from object store, repeatedly call `qemu-img rebase -b fu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~80-~80: Ensure spelling is correct
Context: ...ored the VM disks, it will exit and the VMInstance can launch with these disks (following ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~86-~86: Use a hyphen to join words.
Context: ...ence/object-store.go` - Create a top level dir in the BSL (under the BSL pref...

(QB_NEW_EN_HYPHEN)


[style] ~96-~96: Consider adding a conjunction to make this sound more formal.
Context: ... of the object storage API, but I'm not sure we get any real benefits, since we're alre...

(NOT_SURE_IT_WORKS)


[grammar] ~96-~96: Ensure spelling is correct
Context: ...iff we need. We can just manage them as invidivual objects - This will also requir...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~96-~96: Ensure spelling is correct
Context: ...n just manage them as invidivual objects - This will also require additional overhe...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~97-~97: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...head around kopia maintenance, etc. and on top of that, we still may need to manage qcow2 file...

(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)


[style] ~106-~106: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ld have more than one snapshot file. We need to handle this mapping, including the orde...

(REP_NEED_TO_VB)


[grammar] ~108-~108: Ensure spelling is correct
Context: ...s true, and once implemented, the qcow2 datmover can use this as well. ### General note...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/design/virt-qcow2-datamover.md

13-13: Bare URL used

(MD034, no-bare-urls)


20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


33-33: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


34-34: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


37-37: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


38-38: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


39-39: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


41-41: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


42-42: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


43-43: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


44-44: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


45-45: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


46-46: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


48-48: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


49-49: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


50-50: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


55-55: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


56-56: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


57-57: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


58-58: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


59-59: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


60-60: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


61-61: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


62-62: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


63-63: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


63-63: Link fragments should be valid
Expected: #wherehow-to-store-qcow2-files-and-metadata; Actual: #Wherehow-to-store-qcow2-files-and-metadata

(MD051, link-fragments)


64-64: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


64-64: Link fragments should be valid
Expected: #wherehow-to-store-qcow2-files-and-metadata; Actual: #Wherehow-to-store-qcow2-files-and-metadata

(MD051, link-fragments)


65-65: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


66-66: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


67-67: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


69-69: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


70-70: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


71-71: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


72-72: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


73-73: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


74-74: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


75-75: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


76-76: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


77-77: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


78-78: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


79-79: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


80-80: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


85-85: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


86-86: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


87-87: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


88-88: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


89-89: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


91-91: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


92-92: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


93-93: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


94-94: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


95-95: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


96-96: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


97-97: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


101-101: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


102-102: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


103-103: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


104-104: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

Comment on lines 63 to 64
- Copy the new file to object storage (see [Where to store qcow2 files](#Wherehow-to-store-qcow2-files-and-metadata) below)
- Save any required metadata to identify the stored data (collection of qcow2 pathnames/checkpoints, etc.), along with identifying the backup and VMInstance they're associated with. Save this metadata file as well (see [Where to store qcow2 files](#Wherehow-to-store-qcow2-files-and-metadata) below)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix broken markdown link fragments.

Lines 63–64 reference section links with incorrect capitalization. Markdown link fragments should be lowercase: change #Wherehow-to-store-qcow2-files-and-metadata to #wherehow-to-store-qcow2-files-and-metadata.

Apply this fix:

- Copy the new file to object storage (see [Where to store qcow2 files](#Wherehow-to-store-qcow2-files-and-metadata) below)
+ Copy the new file to object storage (see [Where to store qcow2 files](#wherehow-to-store-qcow2-files-and-metadata) below)

- Save any required metadata to identify the stored data (collection of qcow2 pathnames/checkpoints, etc.), along with identifying the backup and VMInstance they're associated with. Save this metadata file as well (see [Where to store qcow2 files](#Wherehow-to-store-qcow2-files-and-metadata) below)
+ Save any required metadata to identify the stored data (collection of qcow2 pathnames/checkpoints, etc.), along with identifying the backup and VMInstance they're associated with. Save this metadata file as well (see [Where to store qcow2 files](#wherehow-to-store-qcow2-files-and-metadata) below)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

63-63: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


63-63: Link fragments should be valid
Expected: #wherehow-to-store-qcow2-files-and-metadata; Actual: #Wherehow-to-store-qcow2-files-and-metadata

(MD051, link-fragments)


64-64: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


64-64: Link fragments should be valid
Expected: #wherehow-to-store-qcow2-files-and-metadata; Actual: #Wherehow-to-store-qcow2-files-and-metadata

(MD051, link-fragments)

🤖 Prompt for AI Agents
In docs/design/virt-qcow2-datamover.md around lines 63 to 64, the Markdown link
fragment is incorrectly capitalized as
#Wherehow-to-store-qcow2-files-and-metadata; update both occurrences to the
lowercase fragment #wherehow-to-store-qcow2-files-and-metadata so the links
resolve correctly (ensure you change both links in the two bullet points).


- Back up and restore VM volumes using libvirt incremental backup instead of Velero's built-in CSI datamover
- Use existing Velero infrastructure to integrate this feature into regular velero backup and restore
- Implementation based on kubevirt enhancement defined at https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the other URL for context, but noting that using pull mode is out of scope for this design.

## High-Level design

### Upstream velero changes
- Update velero volume policy model to allow unrecognized volume policy actions. Velero would treat unrecognized actions as "skip" (i.e. no snapshot or fs-backup), but the libvirt datamover could only act if the policy action is "virt".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh.. naming is hard. Since this is CNV specific and it's a possible upstream name.. I suggest s/virt/cnv but that's just my opinion. Maybe less offensive to the powers that be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"virt" or "cnv" won't go upstream. The upstream change is to ignore any unrecognized policy action, so upstream will treat "virt", "cnv", or even "weshayutin" all the same way. That being said, we do need to decide what string to use for the policy. Should we use "cnv" instead? I think that's probably better than virt here anyway. Also, overall, should we call this feature "CNV incremental datamover"? Related, we've already created migtools repos: "kubevirt-datamover-plugin" and "kubevirt-datamover-controller" -- should those names also use cnv instead of kubevirt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing the design to use "cnv" for the datamover type field in the DU/DD as well as for the volume policy action, instead of "virt" for both. For the moment, I have not changed any other usage of "virt" or "libvirt" to "cnv". We should probably have a more general naming discussion to work out the related issues.

@sseago sseago force-pushed the virt-datamover-design branch from 43551b2 to b64475c Compare December 12, 2025 20:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 43551b2 and b64475c.

📒 Files selected for processing (1)
  • docs/design/virt-qcow2-datamover.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • docs/design/virt-qcow2-datamover.md
🪛 LanguageTool
docs/design/virt-qcow2-datamover.md

[grammar] ~77-~77: Ensure spelling is correct
Context: ...ackup and incremental files from object store, repeatedly call `qemu-img rebase -b fu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~108-~108: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ld have more than one snapshot file. We need to handle this mapping, including the orde...

(REP_NEED_TO_VB)

🪛 markdownlint-cli2 (0.18.1)
docs/design/virt-qcow2-datamover.md

13-13: Bare URL used

(MD034, no-bare-urls)


14-14: Bare URL used

(MD034, no-bare-urls)

Comment on lines 13 to 18
- Implementation based on kubevirt enhancement defined at https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md
- There is a follow-on design PR at https://github.com/kubevirt/enhancements/pull/126/changes although this mainly discusses push-mode, which is out of scope for the current design.

## Non goals
- Deep integration with velero data mover pluggability (this could be considered in the long-term though, which would minimize duplication of effort and enhance mantainability)
- Using push mode with kubevirt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo + avoid bare URLs for readability and linting
Line 17 has “mantainability” → “maintainability”, and Lines 13–14 should be markdown links (MD034).

- - Deep integration with velero data mover pluggability (this could be considered in the long-term though, which would minimize duplication of effort and enhance mantainability)
+ - Deep integration with velero data mover pluggability (this could be considered in the long-term though, which would minimize duplication of effort and enhance maintainability)

- - Implementation based on kubevirt enhancement defined at https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md
-   - There is a follow-on design PR at https://github.com/kubevirt/enhancements/pull/126/changes although this mainly discusses push-mode, which is out of scope for the current design.
+ - Implementation based on kubevirt enhancement defined at <https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md>
+   - There is a follow-on design PR at <https://github.com/kubevirt/enhancements/pull/126/changes> although this mainly discusses push-mode, which is out of scope for the current design.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

13-13: Bare URL used

(MD034, no-bare-urls)


14-14: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In docs/design/virt-qcow2-datamover.md around lines 13 to 18, fix the typo
"mantainability" → "maintainability" and replace the bare URLs on lines 13–14
with proper Markdown link syntax; update the sentence to embed the kubevirt
enhancement and follow-on design PR as clickable links (e.g. [title](url)) so
the doc passes MD034 linting and reads correctly.

Comment on lines +29 to +32
### Upstream velero changes
- Update velero volume policy model to allow unrecognized volume policy actions. Velero would treat unrecognized actions as "skip" (i.e. no snapshot or fs-backup), but the libvirt datamover could only act if the policy action is "cnv".
- In `pkg/restore/actions/dataupload_retrieve_action.go` and in `DataDownload` we need to add SnapshotType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify field naming: Go struct fields vs YAML fields (prevents implementation drift)
The doc mixes casings like Spec.DataMover / spec.DataMover and SnapshotMoveData in prose. Recommend explicitly stating whether you’re referring to Go API fields or YAML, and using consistent notation (e.g., spec.dataMover, spec.snapshotType, spec.snapshotMoveData) to avoid someone implementing the wrong field name.

Also applies to: 46-48, 54-56, 113-113

🤖 Prompt for AI Agents
In docs/design/virt-qcow2-datamover.md around lines 29 to 32 (also apply same
clarification at lines 46-48, 54-56, and 113), the prose mixes Go struct field
names and YAML/JSON field names (e.g., Spec.DataMover vs spec.DataMover vs
SnapshotMoveData); update the document to explicitly state which notation refers
to the Go API (PascalCase exported struct fields) and which refers to the
YAML/JSON manifest keys (camelCase/lowerCamel), then normalize examples and
prose to use a consistent YAML-style notation for manifest fields (e.g.,
spec.dataMover, spec.snapshotType, spec.snapshotMoveData) while retaining a note
mapping each YAML key to the corresponding Go struct field name (e.g.,
Spec.DataMover -> spec.dataMover) to prevent implementation drift.

@sseago sseago force-pushed the virt-datamover-design branch from b64475c to 46a2119 Compare December 12, 2025 22:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
docs/design/virt-qcow2-datamover.md (1)

29-32: Be explicit about YAML keys vs Go struct fields to avoid implementation drift
The doc uses Go-style (Spec.DataMover, Spec.SnapshotType) in prose; implementers may copy these into manifests/controllers incorrectly. Recommend standardizing on manifest-style (e.g., spec.dataMover, spec.snapshotType, spec.snapshotMoveData) and adding a one-time mapping note to Go fields.

Also applies to: 46-48, 55-56, 113-113

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between b64475c and 46a2119.

📒 Files selected for processing (1)
  • docs/design/virt-qcow2-datamover.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • docs/design/virt-qcow2-datamover.md
🪛 LanguageTool
docs/design/virt-qcow2-datamover.md

[grammar] ~77-~77: Ensure spelling is correct
Context: ...ackup and incremental files from object store, repeatedly call `qemu-img rebase -b fu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~108-~108: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ld have more than one snapshot file. We need to handle this mapping, including the orde...

(REP_NEED_TO_VB)

Comment on lines +70 to +83
- DataDownload reconciler (restore)
- (this area is less well defined so far, since the kubevirt enhancement doesn't go into as much detail on restore)
- We will need a temporary PVC for pulling qcow2 images from object store (if we're restoring the empty temp PVC from backup, that might work here)
- We also need PVCs created for each VM disk we're restoring from qcow2 images.
- We'll need to create another datamover pod here which will do the following:
- pod will have temp PVC mounted, as well as PVCs mounted for each vm disk we're creating.
- pod running command/image will first get the list of qcow2 files to pull from object storage
- once we have the qcow2 full backup and incremental files from object store, repeatedly call `qemu-img rebase -b fullbackup.qcow2 -f qcow2 -u incrementalN.qcow2` for each incremental backup, in order
- Then, convert qcow2 image to raw disk image: `qemu-img convert -f qcow2 -O raw fullbackup.qcow2 restored-raw.img`
- Finally, write this image to the PVC which contains the VM disk
- (repeat process for each disk if VM has multiple disks to restore)
- Note that the various `qemu-img` actions might eventually be combined into a single kubevirt API call, but for the moment this would need to be done manually.
- Once datamover pod has restored the VM disks, it will exit and the VirtualMachineInstance can launch with these disks (following the velero datamover model where the temporary PVC is deleted, retaining the PV, which then binds to the VM's PVCs may work here). The temporary PVC (containing the qcow2 images, not the restored VM disk image) should be completely removed at this point, including PV content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

qemu-img rebase -u unsafe flag documentation

💡 Result:

Summary — what -u (unsafe) does

  • qemu-img rebase -u switches to "unsafe" mode: it only changes the backing-file name/format in the image metadata without verifying or merging any data. Using it can corrupt the guest-visible contents if the new backing file is incorrect. Use only for renaming/moving a backing file or fixing an image whose backing file is already unavailable. [1][2]

Risks and recommendation

  • Unsafe is fast but dangerous: no content checks or merges are performed; the user is responsible for correctness. For safety, prefer the default (safe) rebase when the old backing file exists. [1][2]
  • QEMU projects recommend specifying the backing format explicitly (e.g., -o backing_fmt=...) when creating or changing backing files to avoid format-probing risks; you can use qemu-img rebase -u to add a backing_fmt to an existing image but be aware of security implications. [3]

Quick example

  • qemu-img rebase -u -b /path/to/new-back.img image.qcow2 (unsafe: just update metadata). [1][2]

Sources
[1] QEMU qemu-img docs (rebase — safe vs unsafe).
[2] qemu-img manpage / Debian QEMU utils.
[3] QEMU deprecated-features / backing-file format guidance.


🌐 Web query:

qemu-img rebase backing file format -F flag

💡 Result:

-F (or --backing-fmt) tells qemu-img the format of the backing file (BACKING_FMT) when creating or rebasing an image (used together with -b BACKING_FILE). It is needed when the backing file’s format can’t be autodetected (or you want to override detection). Only formats that support backing files (qcow2 and qed for rebase) can have their backing changed. Example: qemu-img rebase -b base.qcow2 -F qcow2 child.qcow2 (safe mode by default; use -u for unsafe). [1][2][3]

Sources:

  • QEMU qemu-img documentation (rebase, -F). [1]
  • QEMU qemu-img manual (rebase synopsis). [2]
  • qemu-img manpage / distro docs noting formats that support backing files. [3]

🏁 Script executed:

# Search for any qemu-img documentation or references in the repository
git ls-files | head -20

Repository: openshift/oadp-operator

Length of output: 486


🌐 Web query:

qemu-img rebase incremental backup restore best practices

💡 Result:

Key best practices for qemu-img rebase with incremental (qcow2) backups

  • Use qcow2 and QEMU dirty‑bitmaps for incremental backups (point‑in‑time, efficient deltas). Test your QEMU/libvirt versions support bitmaps. [2][5]
  • Keep a known-good full backup as the chain base; create each incremental as a qcow2 with the previous image as its backing file. [3][2]
  • Avoid long backing chains — limit incremental chain length and periodically consolidate (merge) incrementals into a new full image to reduce restore time and corruption risk. [4][5]
  • Never run destructive in‑place rebases on your only copy. Work on a copy of the backup folder or use qmprestore/merge to produce a new image so originals remain intact. [4]
  • Prefer qemu-img rebase safe mode (default) when the old backing file exists — it preserves guest-visible contents by merging needed clusters. Only use unsafe (-u) when you are certain the backing chain is correct (e.g., renames/moves). [1][3]
  • For restores, set the backing chain explicitly (or use qemu-img rebase/merge) and verify final image with qemu-img check and by booting/test-mounting before production use. [1][4]
  • Ensure bitmap handling and backup timing: incremental backups capture blocks modified up to backup start; writes after start may be tracked separately — follow bitmap lifecycle to avoid missed changes. [2][4]
  • Store bitmaps/metadata and backup manifests together, make backups read‑only where possible, and keep consistent naming with timestamps and sequence numbers to avoid rebase mistakes. [4][5]
  • Automate and test restores regularly (including rolling back to specific incremental points); document exact qemu/qemu-img versions used because behavior/options can change across versions. [1][2][5]

Primary references

  • qemu-img rebase documentation and modes (safe vs unsafe). [1]
  • QEMU Dirty Bitmaps / incremental backup guide. [2][3]
  • qmpbackup/virtnbdbackup restore and rebase recommendations (practical tooling and safety suggestions). [4][5]

🌐 Web query:

qemu-img rebase safe mode incremental chain without -u flag

💡 Result:

Short answer: Yes — qemu-img rebase will run in safe mode (the default) without -u, and will correctly rebase an incremental (backing-file) chain — but it requires the original backing files to be present and will merge differing clusters into the image (an expensive operation). [1][2]

Quick guidance:

  • Safe mode is default: qemu-img rebase -b NEW_BASE child.qcow2 (no -u). qemu-img will compare the old backing file and NEW_BASE, merging any clusters that differ into child before changing the backing pointer. This preserves guest-visible content. [1][2]
  • Requirements/limits: the old backing file must be accessible; operation is comparable to a convert in IO/CPU and can be slow for large images. If the old backing is missing, safe mode cannot complete. [1][3]
  • Unsafe mode (-u) only changes the backing-file pointer without merging — fast but can corrupt guest view if the specified backing is wrong or incompatible. Use -u only when you know the files/paths are correct (e.g., renames/moves). [1][2]

If you want a suggested command for a chain (example): rebase each overlay in turn, starting from the top overlay you want to retarget:

  • qemu-img rebase -b /path/to/new-base top-overlay.qcow2
    Repeat for subsequent overlays as needed (ensuring their old backing files are present). Alternatively, use qemu-img convert to flatten an image if you want a standalone file. [1][2]

Sources: QEMU qemu-img documentation (rebase — safe vs unsafe). [1] QEMU qemu-img rebase docs. [2] qemu-img manpages. [3]


Remove -u (unsafe) flag from qemu-img rebase command; use safe rebase with explicit backing format

In a restore scenario, the unsafe flag -u can silently corrupt the backing chain if files differ. Since the old backing files will be available during restore, safe rebase (the default) is both appropriate and necessary. Replace the command with qemu-img rebase -b fullbackup.qcow2 -F qcow2 -f qcow2 incrementalN.qcow2 to explicitly specify backing format and rely on safe merging of any differing clusters.

🧰 Tools
🪛 LanguageTool

[grammar] ~77-~77: Ensure spelling is correct
Context: ...ackup and incremental files from object store, repeatedly call `qemu-img rebase -b fu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In docs/design/virt-qcow2-datamover.md around lines 70 to 83, update the
qemu-img rebase command used during restore: remove the unsafe -u flag and
replace the call with an explicit safe rebase invocation that specifies the
backing format and input format (e.g. use qemu-img rebase -b fullbackup.qcow2 -F
qcow2 -f qcow2 incrementalN.qcow2); ensure all occurrences and any documentation
text referencing the unsafe flag are updated accordingly.

@openshift-ci
Copy link

openshift-ci bot commented Dec 12, 2025

@sseago: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants