Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: don't hold detached tenants in memory #10264

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jan 3, 2025

Problem

Typical deployments of neon have some tenants that stay in use continuously, and a background churning population of tenants that are created and then fall idle, and are configured to Detached state. Currently, this churn of short lived tenants results in an ever-increasing memory footprint.

Closes: #9712

Summary of changes

  • At startup, filter to only load shards that don't have Detached policy
  • In process_result, check if a tenant's shards are all Detached and observed=={}, and if so drop them from memory
  • In tenant_location_conf and other tenant mutators, load the tenants' shards on-demand if they are not present

@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/controller Component: Storage Controller labels Jan 3, 2025
Copy link

github-actions bot commented Jan 3, 2025

7238 tests run: 6867 passed, 11 failed, 360 skipped (full report)


Failures on Postgres 17

Failures on Postgres 16

Failures on Postgres 15

Failures on Postgres 14

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_tenant_s3_restore[release-pg14] or test_tenant_s3_restore[release-pg14] or test_tenant_s3_restore[release-pg15] or test_tenant_s3_restore[release-pg15] or test_tenant_s3_restore[release-pg16] or test_tenant_s3_restore[release-pg16] or test_tenant_s3_restore[release-pg17] or test_tenant_s3_restore[release-pg17] or test_tenant_s3_restore[release-pg17] or test_tenant_s3_restore[release-pg17] or test_tenant_s3_restore[debug-pg17]"
Flaky tests (3)

Postgres 17

Postgres 14

  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Test coverage report is not available

The comment gets automatically updated with the latest test results
dd47681 at 2025-01-06T19:00:20.682Z :recycle:

@jcsp jcsp marked this pull request as ready for review January 3, 2025 13:06
@jcsp jcsp requested a review from a team as a code owner January 3, 2025 13:06
@jcsp jcsp requested review from arpad-m and VladLazar January 3, 2025 13:06
storage_controller/src/persistence.rs Outdated Show resolved Hide resolved
storage_controller/src/service.rs Show resolved Hide resolved
storage_controller/src/service.rs Outdated Show resolved Hide resolved
@jcsp jcsp requested a review from VladLazar January 6, 2025 18:22
@jcsp
Copy link
Collaborator Author

jcsp commented Jan 6, 2025

Let's get this merged and give it a week in staging -- there's a lot of detach/attach churn there, so we should have an excellent chance of spotting any unforseen issues

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!


I think attach hook handling is missing a maybe_load_tenant call around service.rs:1603 and that's causing the test failures.

storage_controller/src/service.rs Outdated Show resolved Hide resolved
@jcsp
Copy link
Collaborator Author

jcsp commented Jan 8, 2025

I think attach hook handling is missing a maybe_load_tenant call around service.rs:1603 and that's causing the test failures.

Yep, thanks for pointing that out.

@jcsp jcsp enabled auto-merge January 8, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage controller: don't hold detached projects in memory
2 participants