Allow option to configure the kubernetes controllers MaxConcurrentReconciles #3021

StevenPJ · 2023-10-24T15:22:14Z

What would you like added?

A clear and concise description of what you want to happen.

We would like to be able to configure the MaxConcurrentReconciles of the runnerpod and runner controllers.

Why is this needed?

The runnerpod_controller is currently making a lot of individual patches per runner pod (each one adds a different annotation to the pod). The controller is not configured to pass options to the k8s controller, so defaults to a MaxConcurrentReconciles of 1. There may be a way to configure this using system configuration, but could not find any docs in ARC or the controller-runtime project, and from looking at the code it looks like it should be passed in on creation. There may be an alternative means of configuring/injecting the setting as are not Golang or kubenetes-controller experts, in which case we would appreciate a pointer to the correct docs.

The high number of requests the controller makes to the k8s api, and the single threaded nature of the controller worker means it is constantly behind and puts a lot of pressure during scaling. We have run multiple configurations with the HRA, from:

min: 0. max:60
min: 60. max:60
min: 100. max:200
min: 100. max:100

and we have found it performs best when no scaling is applied i.e. min == max. While we see the queue depth stay constant (at around 1000 requests in the queue), we observed more consistent performance this way.

A clear and concise description of any alternative solutions or features you've considered.

We have tried using ARC V2 but ran into issues using a single org runner group due to limited availability of the listener pod (a seperate issue but its why we havent upgraded just yet)
We have configured different resources for the controller
We have configured different replica/min/max settings and observe the same behaviour for all.

Additional context

We are using the latest version of the summerwinds controller, using ephmeral runners with a RunnerDeployment and webhook driven scaling. We have a few different runnerDeployments, including our main one in the default organisation runner group. We forked the code and manually configured the MacConcurrentReconciles to 8 and noticed a very clear correlation.

We use kyverno in our cluster, which takes anything from 20ms -> 2seconds to handle requests via the k8s api server. The workers are currently single threaded, and as the load increases it puts more and more pressure on kyverno to be quick, which isn't scalable long term.

Add any other context or screenshots about the feature request here.

This shows a clear correlation between the queue depth and the max number of reconciles.

We did see the number of reconcile errors increase due to dirty reads, however these seem to be retried successfully, and we see a big difference in our cluster (fewer pods stuck in terminating/completed) while offline runners are still being deregistered correctly.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-10-24T15:22:58Z

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

ryaugusta · 2024-03-20T19:29:05Z

Hi @StevenPJ - have you found any solution to the above issue?

mumoshu · 2024-11-25T03:55:50Z

Hey! I have a similar patch for the latest ARC here: #3276 (comment)
Your feedeback/testing are very much welcome- I'd make it a pull request once several people confirmed it effective.

StevenPJ added community Community contribution enhancement New feature or request needs triage Requires review from the maintainers labels Oct 24, 2023

StevenPJ changed the title ~~Allow option to configure the kubernetes controller properties~~ Allow option to configure the kubernetes controllers MaxConcurrentReconciles Oct 24, 2023

mumoshu mentioned this issue Nov 25, 2024

Scaling delay for pod provisioning with higher job spikes #3276

Open

4 tasks

Link- self-assigned this Nov 29, 2024

Link- added the attention Requires attention label Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow option to configure the kubernetes controllers MaxConcurrentReconciles #3021

Allow option to configure the kubernetes controllers MaxConcurrentReconciles #3021

StevenPJ commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

ryaugusta commented Mar 20, 2024

mumoshu commented Nov 25, 2024

Allow option to configure the kubernetes controllers MaxConcurrentReconciles #3021

Allow option to configure the kubernetes controllers MaxConcurrentReconciles #3021

Comments

StevenPJ commented Oct 24, 2023

What would you like added?

Why is this needed?

Additional context

github-actions bot commented Oct 24, 2023

ryaugusta commented Mar 20, 2024

mumoshu commented Nov 25, 2024