Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconfigurator] Avoid deploying expunged datasets to sled agents #7308

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jan 6, 2025

If a blueprint has marked a dataset as expunged, stop telling sled agents to deploy it.

Fixes #7304

@smklein
Copy link
Collaborator Author

smklein commented Jan 6, 2025

This does not resolve #6177 , and the issues raised by @davepacheco there are still open (namely, the ordering between {dataset,zone} {add,remove} which could cause issues during zone expungement). However, this solves the more scope-limited issue raised by #7304 : If a blueprint thinks a dataset is expunged, we stop telling the sled agents to instantiate it.

@smklein smklein requested a review from davepacheco January 6, 2025 19:06
@@ -969,7 +969,14 @@ impl From<BlueprintDatasetsConfig> for DatasetsConfig {
datasets: config
.datasets
.into_iter()
.map(|(id, d)| (id, d.into()))
.filter_map(|(id, d)| {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual fix - basically, converting from "Blueprint dataset" -> "Deployable dataset" format will filter out expunged datasets.

If this feels too magical, I can remove impl From here and instead create:

impl BlueprintDatasetsConfig {
  fn into_in_service_datasets(self) -> DatasetsConfig {
     ...
  }
}

(the implementation would be the same, but this could make it more explicit that we're filtering)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your suggestion, seems like a good refactor :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0bca90f

@davepacheco
Copy link
Collaborator

Given the stuff mentioned in #7313: are we comfortable landing this before having fixed #7309? I think it should be fine. But I don't think it improves things without also doing #6177. And I think it makes the system more complicated to reason about in the meantime.

@smklein
Copy link
Collaborator Author

smklein commented Jan 8, 2025

Given the stuff mentioned in #7313: are we comfortable landing this before having fixed #7309? I think it should be fine. But I don't think it improves things without also doing #6177.

I agree that this shouldn't change the net effect of the system, without subsequent PRs.

And I think it makes the system more complicated to reason about in the meantime.

Not sure I agree with this one. IMO this PR inches us closer to "more correct" behavior because it makes the reconfigurator do the right thing with the APIs it has. I think it'll have the same impact on the Sled Agent as "not requesting expunged datasets" because they aren't getting deleted regardless.

I think it would make sense to do the following:

At this point: The reconfigurator would be sending a set of info to the sled agents that's correct, but not being carried out correctly, because things aren't being deleted.

Then we could:

@davepacheco
Copy link
Collaborator

IMO this PR inches us closer to "more correct" behavior because it makes the reconfigurator do the right thing with the APIs it has. I think it'll have the same impact on the Sled Agent as "not requesting expunged datasets" because they aren't getting deleted regardless.

Agreed!

I think it would make sense to do the following:

I agree that sequence should work. My worry here stems mainly from various recent discussions (like the R12/#7265 one) where I feel like we struggled to accurately predict how the system would behave in this area, particularly across various releases. #7309 itself is an example of a non-obvious consequence of a bunch of other reasonable-seeming things. I guess another way to say that is that if we land this fix, then a bug (6177) becomes load-bearing for zone expungement to keep working, which feels like a weird place to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reconfigurator does not try to remove expunged datasets
3 participants