Comments (18)
@chriscchien Also happened in v1.6.x and v1.5.x?
This issue can not be reproduced on v1.5.5 and v1.6.2.
from longhorn.
Uninstallation? If the backup target is valid, it won't trigger the issue.
ah, typo..
If the backup target is valid or invalid, all of them will walk through the
UpdateBackupTarget
call below. Why only the invalid case will be triggered? Is there anything I missed?/controller/uninstall_controller.go#L322-L328
} else if len(backupTargets) > 0 { for _, bt := range backupTargets { if _, err = c.ds.UpdateBackupTarget(bt); err != nil { return errors.Wrap(err, "failed to touch the backup target CR for API version migration") } } }
https://github.com/longhorn/longhorn-manager/blob/master/controller/backup_target_controller.go#L383-L391
It is due to the frequent update of an invalid backup target.
Although the error message is the same, a different timestamp always leads to a update.
from longhorn.
@chriscchien Also happened in v1.6.x and v1.5.x?
This issue can not be reproduced on v1.5.5 and v1.6.2.
We should backport this, as longhorn/longhorn-manager#2812 was backported to 1.6.2 and 1.5.6 (unreleased) already?
from longhorn.
cc @mantissahz
from longhorn.
If the backup target is valid or invalid, all of them will walk through the
UpdateBackupTarget
call below. Why only the invalid case will be triggered? Is there anything I missed?
I think it is very unlikely for there to be a conflict in the case of a valid BackupTarget. But for the invalid case, #8224 causes frequent updates, so a conflict is quite likely.
from longhorn.
@chriscchien Also happened in v1.6.x and v1.5.x?
from longhorn.
Quickly scan the longhorn-manager
logs in the support bundle, in a second there are lots of
2024-06-20T04:18:44.848117979Z time="2024-06-20T04:18:44Z" level=error msg="Failed to get info from backup store" func="controller.(*BackupTargetController).reconcile" file="backup_target_controller.go:389" controller=longhorn-backup-target cred= error="failed to list backup volumes in nfs://longhorn-test-nfs-svc.defsdfsfdault:/opt/backupstore: error listing backup volume names: failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-master-head/longhorn [/var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-master-head/longhorn backup ls --volume-only nfs://longhorn-test-nfs-svc.defsdfsfdault:/opt/backupstore], output cannot mount nfs longhorn-test-nfs-svc.defsdfsfdault:/opt/backupstore, options [nfsvers=4.0 actimeo=1 soft timeo=300 retry=2]: vers=4.0: mount failed: exit status 32\nMounting command: mount\nMounting arguments: -t nfs4 -o nfsvers=4.0,actimeo=1,soft,timeo=300,retry=2 longhorn-test-nfs-svc.defsdfsfdault:/opt/backupstore /var/lib/longhorn-backupstore-mounts/longhorn-test-nfs-svc_defsdfsfdault/opt/backupstore\nOutput: mount.nfs4: Failed to resolve server longhorn-test-nfs-svc.defsdfsfdault: Name or service not known\n: vers=4.1: mount failed: exit status 32\nMounting command: mount\nMounting arguments: -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 longhorn-test-nfs-svc.defsdfsfdault:/opt/backupstore /var/lib/longhorn-backupstore-mounts/longhorn-test-nfs-svc_defsdfsfdault/opt/backupstore\nOutput: mount.nfs4: Failed to resolve server longhorn-test-nfs-svc.defsdfsfdault: Name or service not known\n: vers=4.2: mount failed: exit status 32\nMounting command: mount\nMounting arguments: -t nfs4 -o nfsvers=4.2,actimeo=1,soft,timeo=300,retry=2 longhorn-test-nfs-svc.defsdfsfdault:/opt/backupstore /var/lib/longhorn-backupstore-mounts/longhorn-test-nfs-svc_defsdfsfdault/opt/backupstore\nOutput: mount.nfs4: Failed to resolve server longhorn-test-nfs-svc.defsdfsfdault: Name or service not known\n: cannot mount using NFSv4\n, stderr warning: GOCOVERDIR not set, no coverage data emitted\ntime=\"2024-06-20T04:18:44Z\" level=warning msg=\"Trying reading mount point /var/lib/longhorn-backupstore-mounts/longhorn-test-nfs-svc_defsdfsfdault/opt/backupstore to make sure it is healthy\" func=util.EnsureMountPoint file=\"util.go:309\" pkg=nfs\ntime=\"2024-06-20T04:18:44Z\" l
...
And frequently updating the backup target status will block the uninstall procedure.
It should poll the backup target status in pollInterval
.
from longhorn.
What's the poll interview set to cause this frequent update? The original should be 300 seconds.
The workaround would be to disable it by setting 0?
from longhorn.
I think the frequent update is caused by that the error messages is not the same (including time like E0620 04:18:45.681625 20213 mount_linux.go:236]
)
The workaround would be to empty the backup target url first.
from longhorn.
" controller=longhorn-uninstall error="failed to touch the backup target CR for API version migration: Operation cannot be fulfilled on backuptargets.longhorn.io \"default\": the object has been modified; please apply your changes to the latest version and try again"
If it runs into the object has been modified
, the failed update can be ignored. The purpose of the touch (update) is to trigger version migration. The error the object has been modified
indicates the resource is already updated and should be migrated.
from longhorn.
Pre Ready-For-Testing Checklist
- Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:
- Setup invalid backup target
- Uninstall Longhorn
- Is there a workaround for the issue? If so, where is it documented?
The workaround is at:
- Empty the backup target URL.
-
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including
backport-needed/*
)?
The PR is at
longhorn/longhorn-manager#2897 -
Which areas/issues this PR might have potential impacts on?
Area
Issues
from longhorn.
I think the frequent update is caused by that the error messages is not the same (including time like E0620 04:18:45.681625 20213 mount_linux.go:236])
Related to #8224.
from longhorn.
" controller=longhorn-uninstall error="failed to touch the backup target CR for API version migration: Operation cannot be fulfilled on backuptargets.longhorn.io \"default\": the object has been modified; please apply your changes to the latest version and try again"
Does this mean that if Longhorn is installed from the master branch (w/o the fix) with a valid/invalid backup target configured, the installation will always fail?
from longhorn.
" controller=longhorn-uninstall error="failed to touch the backup target CR for API version migration: Operation cannot be fulfilled on backuptargets.longhorn.io \"default\": the object has been modified; please apply your changes to the latest version and try again"
Does this mean that if Longhorn is installed from the master branch (w/o the fix) with a valid/invalid backup target configured, the installation will always fail?
Uninstallation? If the backup target is valid, it won't trigger the issue.
from longhorn.
Uninstallation? If the backup target is valid, it won't trigger the issue.
ah, typo..
If the backup target is valid or invalid, all of them will walk through the UpdateBackupTarget
call below. Why only the invalid case will be triggered? Is there anything I missed?
/controller/uninstall_controller.go#L322-L328
} else if len(backupTargets) > 0 {
for _, bt := range backupTargets {
if _, err = c.ds.UpdateBackupTarget(bt); err != nil {
return errors.Wrap(err, "failed to touch the backup target CR for API version migration")
}
}
}
from longhorn.
Uninstallation? If the backup target is valid, it won't trigger the issue.
ah, typo..
If the backup target is valid or invalid, all of them will walk through the
UpdateBackupTarget
call below. Why only the invalid case will be triggered? Is there anything I missed?/controller/uninstall_controller.go#L322-L328
} else if len(backupTargets) > 0 { for _, bt := range backupTargets { if _, err = c.ds.UpdateBackupTarget(bt); err != nil { return errors.Wrap(err, "failed to touch the backup target CR for API version migration") } } }
https://github.com/longhorn/longhorn-manager/blob/master/controller/backup_target_controller.go#L394-L397
It is due to the frequent update of an invalid backup target.
Although the error message is the same, a different timestamp always leads to an update.
from longhorn.
Well explained @derekbit @ejweber
from longhorn.
Verified pass on longhorn master(longhorn-manager b19161
) with test steps
Uninstallation success when invalid backuptarget is set.
from longhorn.
Related Issues (20)
- [BUG][v1.7.x-head] Test case `test_dr_volume_with_backup_block_deletion_abort_during_backup_in_progress` failed due to `failed lock *.lck type 1 acquisition` HOT 3
- [BUG][v1.7.x-head] Test case `test_engine_image_not_fully_deployed_perform_auto_upgrade_engine` failed due to engine image unable to deploy on one of nodes HOT 10
- [BUG] HA Volume Migration: Volume does not auto-attach to another node after turning off the original node HOT 2
- [BACKPORT][v1.6.3][BUG][v1.7.x] V2 volume cannot detach after upgrade if a recurring job was set before the upgrade
- [BUG] System restore with backing image could fail due to backing image checksum mismatch HOT 7
- [TEST] longhorn-test pod crashed on azure pipeline when proform test_setting_concurrent_volume_backup_restore_limit crash HOT 1
- Longhorn Volumes are going in indefinite expansion mode HOT 1
- [BUG] V2 volume snapshot creation time disappear after upgrade from v1.6.2 to v1.7.0-dev HOT 6
- [TEST][BUG] V2 volume snapshot creation time disappear after upgrade from v1.6.2 to v1.7.0-dev
- [TEST][BUG][v1.7.x] V2 volume cannot detach after upgrade if a recurring job was set before the upgrade
- [BACKPORT][v1.6.3][BUG]filesystem trim RecurringJob times out (volumes where files are frequently created and deleted)
- [BUG] Imposible to use an storage network HOT 5
- [BUG] replicas infinitely rebuilding HOT 3
- [BACKPORT][v1.6.3][BUG] Longhorn thinks node is unschedulable
- [BUG] Can not revert V2 volume snapshot after upgrade from v1.6.2 to v1.7.0-dev HOT 19
- [BUG][v1.7.x-head] Test case `test_support_bundle_should_not_timeout` support bundle cleanup failed HOT 1
- [FEATURE] Differentiate disk space (`filesystem`) usage by v1 data engine
- [UI][FEATURE] Differentiate disk space (`filesystem`) usage by v1 data engine
- [TEST][FEATURE] Differentiate disk space (`filesystem`) usage by v1 data engine
- [BUG] StorageClass parameters not saved when editing longhorn-storageclass configMap HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from longhorn.