Skip to content

Support of snapshot copy to primary storage in different zones.#9478

Merged
sureshanaparti merged 15 commits intoapache:mainfrom
storpool:support-snapshot-copy-on-primary
Aug 4, 2025
Merged

Support of snapshot copy to primary storage in different zones.#9478
sureshanaparti merged 15 commits intoapache:mainfrom
storpool:support-snapshot-copy-on-primary

Conversation

@slavkap
Copy link
Contributor

@slavkap slavkap commented Aug 1, 2024

Description

This PR allows copying a snapshot from a primary storage pool in one zone to a primary storage pool in different zones without involving secondary storage.
This feature is currently implemented only in the StorPool storage plugin. Other storage plugins can add this feature if the storage pools support direct copies of snapshots from one pool to another.

Added additional API param usestoragereplication and one more for Admin users storageids in CloudStack API calls:

createSnapshot
copySnapshot
createSnapshotPolicy

The option snapshot.backup.to.secondary does not apply to the copy functionality. The snapshots will be copied only to the required primary storage in a different zone.
The user can create volumes/templates from the copied snapshots. The user can make copies only to a primary or to a secondary storage - at the moment, there is no option to do the copy on both.

The destination zone is a mandatory parameter if the users want to copy a snapshot, and the usestoragereplication if the copy has to be on the primary storage

For Admin users:
Those users can define the primary destination storage for the copy or to enable the setting use.storage.replication in the Primary storage settings.
image
image
image

For regular Users:
The Admin should enable the configuration setting use.storage.replication in the Primary storage settings.
image
image
image

For other storage plugins that want to adopt this functionality:

  • The Primary storage driver should have the capability CAN_COPY_SNAPSHOT_BETWEEN_ZONES
  • the respective plugin needs to implement the copySnapshot method in their SnapshotStrategy and that the driver can handle the COPY operation

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale

  • Major
  • Minor

Screenshots (if appropriate):

How Has This Been Tested?

Manual and smoke tests with StorPool primary storage on multiple zones

@codecov
Copy link

codecov bot commented Aug 1, 2024

Codecov Report

❌ Patch coverage is 6.69746% with 808 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.33%. Comparing base (a60c8ca) to head (6668140).
⚠️ Report is 38 commits behind head on main.

Files with missing lines Patch % Lines
...om/cloud/storage/snapshot/SnapshotManagerImpl.java 8.08% 174 Missing and 8 partials ⚠️
...ack/storage/snapshot/StorPoolSnapshotStrategy.java 0.00% 154 Missing ⚠️
...age/collector/StorPoolAbandonObjectsCollector.java 0.00% 69 Missing ⚠️
...org/apache/cloudstack/snapshot/SnapshotHelper.java 14.70% 52 Missing and 6 partials ⚠️
...tastore/driver/StorPoolPrimaryDataStoreDriver.java 0.00% 56 Missing ⚠️
...loudstack/storage/datastore/util/StorPoolUtil.java 0.00% 37 Missing ⚠️
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 24.39% 29 Missing and 2 partials ⚠️
...oudstack/storage/snapshot/SnapshotServiceImpl.java 0.00% 27 Missing ⚠️
...e/wrapper/StorPoolModifyStorageCommandWrapper.java 0.00% 21 Missing ⚠️
...udstack/storage/datastore/util/StorPoolHelper.java 0.00% 21 Missing ⚠️
... and 25 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9478      +/-   ##
============================================
- Coverage     17.35%   17.33%   -0.02%     
- Complexity    15189    15198       +9     
============================================
  Files          5883     5883              
  Lines        524514   525259     +745     
  Branches      64007    64131     +124     
============================================
+ Hits          91013    91042      +29     
- Misses       423216   423924     +708     
- Partials      10285    10293       +8     
Flag Coverage Δ
uitests 3.63% <ø> (-0.01%) ⬇️
unittests 18.37% <6.69%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@slavkap slavkap force-pushed the support-snapshot-copy-on-primary branch from 8cc7b49 to 0df6764 Compare August 14, 2024 09:14
@slavkap slavkap marked this pull request as ready for review August 14, 2024 09:51
@slavkap slavkap changed the title [WIP] Support of snapshot copy to StorPool primary storage in different zones Support of snapshot copy to StorPool primary storage in different zones Aug 14, 2024
@github-actions
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@slavkap slavkap force-pushed the support-snapshot-copy-on-primary branch from 0df6764 to e0f4283 Compare August 22, 2024 12:49
@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10756

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11155)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 51322 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9478-t11155-kvm-ol8.zip
Smoke tests completed. 137 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_snapshot_usage Failure 38.32 test_usage.py
test_01_vpc_site2site_vpn Failure 308.08 test_vpc_vpn.py

@github-actions
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@slavkap slavkap force-pushed the support-snapshot-copy-on-primary branch from e0f4283 to 6f209d8 Compare August 30, 2024 09:08
@yadvr yadvr added this to the 4.20.0.0 milestone Sep 4, 2024
@github-actions
Copy link

github-actions bot commented Sep 4, 2024

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

Copy link
Collaborator

@RosiKyu RosiKyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slavkap - great job! I've identified some minor issues that need to be addressed:

  1. Resource Cleanup: Failed snapshots left in unusable "Allocated" state - need proper rollback/cleanup
  2. UI Label: Shows label.useStorageReplication instead of "Use storage replication"

Below is a list of the test env I've used and everything that was covered:

Test Environment

3-zone CloudStack environment
NFS primary storage (NetworkFilesystem, DefaultPrimary provider)
6 test VMs deployed across all zones
Admin and regular user accounts configured
CloudMonkey CLI profiles for both user types

Test Areas Covered:

RBAC (Role-Based Access Control) Testing

  • Admin user snapshot creation with usestoragereplication=true
  • Admin user snapshot creation with storageids parameter
  • Regular user snapshot creation with usestoragereplication=true
  • Regular user snapshot creation with storageids parameter ( ISSUE: Api call fails but snapshot created / allocated)
  • Resource isolation testing - regular user accessing admin volumes (properly blocked)
  • Cross-user access validation - admin accessing regular user resources (working correctly)
  • Permission boundary testing for all new parameters

Snapshot Creation Testing

  • Traditional snapshot creation (admin user) - BackedUp state
  • Traditional snapshot creation (regular user) - BackedUp state
  • Snapshot creation with usestoragereplication=true (both users) - Error 530, Allocated state (this is due to the usage of NFS storage)
  • Snapshot creation with usestoragereplication=false (both users) - BackedUp state
  • Snapshot creation with usestoragereplication=invalid_value (both users) - BackedUp state (parameter ignored)
  • Snapshot creation with storageids parameter (both users) - Error 530, Allocated state in UI
  • VM state validation - snapshots require stopped VMs

Cross-Zone Copy Testing

  • Traditional cross-zone copy (admin user) - SUCCESS via secondary storage
  • Traditional cross-zone copy (regular user) - SUCCESS via secondary storage
  • Cross-zone copy with usestoragereplication=true (admin) - Error 431 (expected NFS limitation)
  • Cross-zone copy with usestoragereplication=true (regular user) - Error 431 (expected NFS limitation)
  • Cross-zone copy with storageids parameter (admin) - Error 530
  • Cross-zone copy with storageids parameter (regular user) - Error 431 (ISSUE: Should be blocked)
  • Cross-zone copy of "Allocated" state snapshots - Properly blocked
  • Cross-zone copy parameter validation (invalid zones, invalid storage IDs)

Snapshot Policy Testing

  • Traditional hourly policies (both users) - SUCCESS
  • Traditional daily policies (both users) - SUCCESS
  • Traditional weekly policies (both users) - SUCCESS
  • Policies with usestoragereplication=true parameter (both users) - Accepted
  • Policies with storageids parameter (both users) - Accepted
  • Policy schedule format validation - Proper rejection of invalid formats
  • Policy storage pool assignment - Automatic assignment working

Parameter Validation Testing

  • Boolean parameter testing: true, false, invalid_value
  • Storage ID validation: valid IDs, invalid IDs, non-existent IDs
  • Zone ID validation: valid zones, invalid zones
  • Parameter combination testing: multiple parameters together
  • Empty parameter testing
  • Malformed parameter testing

Backward Compatibility Testing

  • Traditional createSnapshot (no new parameters) - 100% working
  • Traditional copySnapshot via secondary storage - 100% working
  • Traditional createSnapshotPolicy - 100% working
  • Existing API endpoints unchanged - 100% compatible
  • No performance regression in traditional operations
  • Zero breaking changes in existing workflows

Error Handling Testing

  • Error consistency across user types - Same error codes for same operations
  • Graceful failure testing - No system crashes or exceptions
  • Invalid resource access - Proper error messages
  • Storage capability mismatch errors - Clear messaging
  • Permission denied scenarios - Appropriate responses
  • Malformed request handling - Robust validation

UI Integration Testing

  • New UI toggle functionality validation
  • Storage pool dropdown behavior testing
  • Form validation and submission testing
  • User type UI difference validation
  • Error message display in UI
  • UI label issue identified: label.useStorageReplication should be "Use storage replication"

Security Testing

  • Resource isolation between user types - Working correctly
  • Admin privilege validation - Working correctly
  • Regular user restriction testing - Mostly working (storageids issue)
  • Cross-account operation attempts - Properly blocked
  • Permission boundary enforcement - Partial (storageids bypass)
  • Privilege escalation prevention - Working (except storageids)

Integration Testing

  • End-to-end snapshot lifecycle - Complete workflow functional
  • Multi-zone operation workflows - Cross-zone logic working
  • Template creation from snapshots - SUCCESS (regression test)
  • Volume reversion from snapshots - SUCCESS (regression test)
  • Policy execution simulation - Scheduling working
  • System integration validation - All components connected properly

Regression Testing

  • Pre-existing snapshot functionality - 100% preserved
  • Traditional cross-zone copy methods - 100% working
  • Snapshot policy execution - 100% functional
  • API backward compatibility - 100% maintained
  • System stability - No regressions introduced

@blueorangutan
Copy link

[SF] Trillian test result (tid-13973)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 94706 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9478-t13973-kvm-ol8.zip
Smoke tests completed. 138 look OK, 7 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestRouterServices>:setup Error 0.00 test_routers.py
ContextSuite context=TestCpuCapServiceOfferings>:setup Error 0.00 test_service_offerings.py
ContextSuite context=TestServiceOfferings>:setup Error 0.33 test_service_offerings.py
ContextSuite context=TestSetSourceNatIp>:setup Error 0.00 test_set_sourcenat.py
ContextSuite context=TestSnapshotRootDisk>:setup Error 0.00 test_snapshots.py
ContextSuite context=TestSnapshotStandaloneBackup>:setup Error 0.00 test_snapshots.py
test_03_ssvm_internals Failure 0.80 test_ssvm.py
test_05_stop_ssvm Error 4811.60 test_ssvm.py
test_07_reboot_ssvm Error 911.74 test_ssvm.py
test_09_reboot_ssvm_forced Error 399.90 test_ssvm.py
test_01_create_template Error 1.18 test_templates.py
test_CreateTemplateWithDuplicateName Error 1.13 test_templates.py
test_02_create_template_with_checksum_sha1 Error 65.61 test_templates.py
test_03_create_template_with_checksum_sha256 Error 65.63 test_templates.py
test_04_create_template_with_checksum_md5 Error 65.62 test_templates.py
test_05_create_template_with_no_checksum Error 65.60 test_templates.py
ContextSuite context=TestTemplates>:setup Error 284.41 test_templates.py
test_01_volume_usage Error 131.82 test_usage.py

@github-actions
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

1 similar comment
@github-actions
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@sureshanaparti
Copy link
Contributor

@slavkap - great job! I've identified some minor issues that need to be addressed:

Resource Cleanup: Failed snapshots left in unusable "Allocated" state - need proper rollback/cleanup
UI Label: Shows label.useStorageReplication instead of "Use storage replication"

@slavkap can you check these issues and resolve the conflicts

slavkap added 15 commits August 1, 2025 11:52
…n zones

Added support to copy a snapshot to another StorPool primary storage in
different zones.
Added drop down to choose the primary storage pools to copy a snapshot
Small fixes
hide the primary storage from the users in the UI
refactor smoke test
fix copy when reccuring snapshot
fix UI after rebasing
Pool type to volumes that are created from snapshots
Added StorPool tags to snapshots that are copied from remote
@slavkap
Copy link
Contributor Author

slavkap commented Aug 1, 2025

@blueorangutan package

@blueorangutan
Copy link

@slavkap a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@slavkap
Copy link
Contributor Author

slavkap commented Aug 1, 2025

Thank you, @rosi-shapeblue, for the detailed information!

The UI issue with the label has been fixed in the latest commit. Additionally, snapshot allocation will no longer occur if the storage doesn't support copying to another zone.
image

I’ve tested snapshot copying between zones using StorPool, and confirmed that this feature doesn't break existing functionality for copying snapshots between zones on secondary storage. Testing was also done with NFS as the primary storage.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14503

@sureshanaparti
Copy link
Contributor

@rosi-shapeblue can you check if your concerns are addressed?

Copy link
Collaborator

@RosiKyu RosiKyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The issues that were discussed have been addressed.

image

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-14022)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 47587 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9478-t14022-kvm-ol8.zip
Smoke tests completed. 146 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants