Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow option to configure the kubernetes controllers MaxConcurrentReconciles #3021

Open
StevenPJ opened this issue Oct 24, 2023 · 3 comments
Assignees
Labels
attention Requires attention community Community contribution enhancement New feature or request needs triage Requires review from the maintainers

Comments

@StevenPJ
Copy link

What would you like added?

A clear and concise description of what you want to happen.

We would like to be able to configure the MaxConcurrentReconciles of the runnerpod and runner controllers.

Why is this needed?

The runnerpod_controller is currently making a lot of individual patches per runner pod (each one adds a different annotation to the pod). The controller is not configured to pass options to the k8s controller, so defaults to a MaxConcurrentReconciles of 1. There may be a way to configure this using system configuration, but could not find any docs in ARC or the controller-runtime project, and from looking at the code it looks like it should be passed in on creation. There may be an alternative means of configuring/injecting the setting as are not Golang or kubenetes-controller experts, in which case we would appreciate a pointer to the correct docs.

The high number of requests the controller makes to the k8s api, and the single threaded nature of the controller worker means it is constantly behind and puts a lot of pressure during scaling. We have run multiple configurations with the HRA, from:

  • min: 0. max:60
  • min: 60. max:60
  • min: 100. max:200
  • min: 100. max:100

and we have found it performs best when no scaling is applied i.e. min == max. While we see the queue depth stay constant (at around 1000 requests in the queue), we observed more consistent performance this way.

A clear and concise description of any alternative solutions or features you've considered.

  • We have tried using ARC V2 but ran into issues using a single org runner group due to limited availability of the listener pod (a seperate issue but its why we havent upgraded just yet)
  • We have configured different resources for the controller
  • We have configured different replica/min/max settings and observe the same behaviour for all.

Additional context

We are using the latest version of the summerwinds controller, using ephmeral runners with a RunnerDeployment and webhook driven scaling. We have a few different runnerDeployments, including our main one in the default organisation runner group. We forked the code and manually configured the MacConcurrentReconciles to 8 and noticed a very clear correlation.

We use kyverno in our cluster, which takes anything from 20ms -> 2seconds to handle requests via the k8s api server. The workers are currently single threaded, and as the load increases it puts more and more pressure on kyverno to be quick, which isn't scalable long term.

Add any other context or screenshots about the feature request here.

This shows a clear correlation between the queue depth and the max number of reconciles.
image

We did see the number of reconcile errors increase due to dirty reads, however these seem to be retried successfully, and we see a big difference in our cluster (fewer pods stuck in terminating/completed) while offline runners are still being deregistered correctly.

@StevenPJ StevenPJ added community Community contribution enhancement New feature or request needs triage Requires review from the maintainers labels Oct 24, 2023
@StevenPJ StevenPJ changed the title Allow option to configure the kubernetes controller properties Allow option to configure the kubernetes controllers MaxConcurrentReconciles Oct 24, 2023
@github-actions
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@ryaugusta
Copy link

Hi @StevenPJ - have you found any solution to the above issue?

@mumoshu
Copy link
Collaborator

mumoshu commented Nov 25, 2024

Hey! I have a similar patch for the latest ARC here: #3276 (comment)
Your feedeback/testing are very much welcome- I'd make it a pull request once several people confirmed it effective.

@Link- Link- self-assigned this Nov 29, 2024
@Link- Link- added the attention Requires attention label Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attention Requires attention community Community contribution enhancement New feature or request needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

4 participants