Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: continue routing to serving endpointslices during termination #4946

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fogninid
Copy link

@fogninid fogninid commented Dec 18, 2024

Continue routing to endpoints while their graceful termination is in progress

What this PR does / why we need it:
use serving condition that was defined in kubernetes/kubernetes#92968

Release Notes: Yes

@fogninid fogninid requested a review from a team as a code owner December 18, 2024 19:56
Copy link
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@arkodg
Copy link
Contributor

arkodg commented Dec 31, 2024

@fogninid why should traffic be routed to a pod during graceful termination? Imo with the right rolling update strategy in place, traffic should be shifted to the newer pod

@fogninid
Copy link
Author

fogninid commented Jan 7, 2025

@fogninid why should traffic be routed to a pod during graceful termination? Imo with the right rolling update strategy in place, traffic should be shifted to the newer pod

@arkodg the {Serving=true,Terminating=true} state was defined by kubernetes from the PR I linked in the cover message.
My understanding is that the state is necessary so that backend-applications can communicate that traffic can still be routed to terminating pods, instead of dropping it and causing disruptions.
The clearest example is that of a rolling update of a statefulset of one replica: the single terminating pod can now have time to gracefully stop while still handling all incoming requests.

My expectations are for envoy-gateway to behave in the same way as other networking solutions. Most explicit mentions I found about this logic are close to what cilium is doing here.

envoy-gateway could do similar, using all endpoints with state {Serving=true,Terminating=false} if available, and falling back to include those {Serving=true,Terminating=true} if none would be found.

Would this be the right place to introduce this logic?

@arkodg
Copy link
Contributor

arkodg commented Jan 7, 2025

thanks for sharing the link @fogninid, found this blog https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/#traffic-loss-from-load-balancers-during-rolling-updates which highlights how it can be used by Ingress Controllers

Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints.

This would translate to, if an endpoint is not Ready, but Serving, we can set it the lb endpoint status to Draining to not disrupt the existing connections right away, but newer connections would go to other endpoints

wdyt @envoyproxy/gateway-maintainers / @envoyproxy/gateway-reviewers

@fogninid
Copy link
Author

fogninid commented Jan 8, 2025

thanks for sharing the link @fogninid, found this blog https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/#traffic-loss-from-load-balancers-during-rolling-updates which highlights how it can be used by Ingress Controllers

Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints.

This would translate to, if an endpoint is not Ready, but Serving, we can set it the lb endpoint status to Draining to not disrupt the existing connections right away, but newer connections would go to other endpoints

wdyt @envoyproxy/gateway-maintainers / @envoyproxy/gateway-reviewers

The description in that link for the behavior of a load balancer is too simplified for my taste. There is no 1:1 mapping from a single field of the EndpointConditions to the expected lb state of the matching endpoint, but rather the full list of endpoints and the combination of Serving,Terminating should map to a set of lb endpoints with appropriate states.

Again my main reasoning is that new connections should not be dropped if any Serving=true endpoint is currently available (irrespective its termination), this is the same as described for kube-proxy in this quote from that link:

Starting in Kubernetes v1.26, kube-proxy enables the ProxyTerminatingEndpoints feature by default, which adds automatic failover and routing to terminating endpoints in scenarios where the traffic would otherwise be dropped. More specifically, when there is a rolling update and a Node only contains terminating Pods, kube-proxy will route traffic to the terminating Pods based on their readiness. In addition, kube-proxy will actively fail the health check NodePort if there are only terminating Pods available. By doing so, kube-proxy alerts the external load balancer that new connections should not be sent to that Node but will gracefully handle requests for existing connections.

Based on the enovy config link you provided, I believe the k8s endpoints => envoy config mapping should look something like this:

k8s endpoints envoy config
[{Address1,Serving=true,Terminating=false}] [{Address1,HealthStatus=HEALTHY}]
[{Address1,Serving=false,Terminating=false}] [{Address1,HealthStatus=UNHEALTHY}]
[{Address1,Serving=false,Terminating=true}] [{Address1,HealthStatus=UNHEALTHY}]
[{Address1,Serving=true,Terminating=true}] [{Address1,HealthStatus=HEALTHY}] Endpoint1 enters termination phase (preStop running)
[{Address1,Serving=true,Terminating=true},{Address2,Serving=false,Terminating=false}] [{Address1,HealthStatus=HEALTHY},{Address2,HealthStatus=UNHEALTHY}] Endpoint2 appears, but is not ready yet => allow full use of the terminating endpoint
[{Address1,Serving=true,Terminating=true},{Address2,Serving=true,Terminating=false}] [{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}] Endpoint2 becomes ready => drain the terminating endpoint
[{Address1,Serving=false,Terminating=true},{Address2,Serving=true,Terminating=false}] [{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}] Endpoint1 enters final termination phase (preStop done)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants