-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change LH default settings to fix maintenance mode issues #788
Conversation
Signed-off-by: Jian Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major concern from me on the Longhorn side.
We default to block-if-contains-last-replica
because there are cluster upgrade strategies that remove nodes from a cluster after they have been drained and replace them with new nodes. allow-if-last-replica-is-stopped
can not protect data for single replica volumes in these scenarios. Since Harvester controls the upgrade process, it is fine to use allow-if-last-replica-is-stopped
assuming Harvester team is sure the situation I described will not occur.
detach-manually-attached-volumes-when-cordoned
comes down to a balance of priorities. We chose to default it to false so users would not be surprised by a volume they purposely attached becoming detached during a cluster maintenance operation. We're not sure (in general) why users are manually attaching volumes, so it's hard to gauge the importance of this. If the Harvester support team is repeatedly running into issues in customer environments in which manual attachments are disrupting upgrades, this is probably a good change.
@WebberHuang1118 @FrankYang0529 @Yu-Jack Can you check if we have any remaining manually attached paths or other concerns? Thanks. |
Hi @ejweber, For
What is this situation? I am sure we did not remove the Longhorn nodes from the cluster when upgrading. I just want to confirm that this behavior is triggered by the user who owns the cluster instead of the Longhorn itself. For
I prefer this configuration to be false. As you mentioned, the user should know what happened with their volume if the node cordon. Actually, I checked the harvester project, and it looks like we did not use the |
@Vicente-Cheng I try to answer from the context of Harvester: From Harvester's practice, the production deployment are mostly based on bare-metal servers, customer has such scenarios which require a node maintenance: In those cases, customer expects clicking the LH provides many options for various of scenarios and for compatibility. Back to Harvester, we need to recheck each option and select the best match for customer convenience. We need to make a decision to balance between: (1) Keep current option, if user accidently/actively attache a volume, he needs to detach it to get the success of (2) Use BTW, please also check the document PR: harvester/docs#619, it has more details about the issue and the updated Harvester document, thanks. |
Thanks @w13915984028, For the For the Just want to make the content of these configs clear. The changes in this PR are ok with me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Regarding Eric's comment:
We default to block-if-contains-last-replica because there are cluster upgrade strategies that remove nodes from a cluster after they have been drained and replace them with new nodes. allow-if-last-replica-is-stopped can not protect data for single replica volumes in these scenarios. Since Harvester controls the upgrade process, it is fine to use allow-if-last-replica-is-stopped assuming Harvester team is sure the situation I described will not occur.
AFAIK we upgrade all nodes inplace rather than removing and adding new nodes, so the data should remain intact throughout.
Harvester doesn't manually attach volume for VM snapshot/backup instead we make use of LH Volume Attach/Detach Implementation since Harvester v1.3.0 harvester/harvester#4910 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
@mergify backport v1.4 |
✅ Backports have been created
|
Problem:
The Harvester maintenance mode is blocked by manually attached volumes and single-replica volumes.
Solution:
Change LH default setting to allow node drain on such scenarios.
Refer:
https://github.com/longhorn/charts/blob/v1.6.x/charts/longhorn/values.yaml#L248
https://github.com/longhorn/charts/blob/ad73dc01239b7eeb25ff510ce8358578433d85a5/charts/longhorn/values.yaml#L250
Related Issue:
harvester/harvester#6264
harvester/harvester#6266
Test plan:
Per issue steps.
After a new installation, get such LH default settings:
Following PRs: