-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: explain panic mode #4990
docs: explain panic mode #4990
Conversation
Signed-off-by: Huabing Zhao <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4990 +/- ##
==========================================
- Coverage 66.77% 66.71% -0.06%
==========================================
Files 209 209
Lines 32052 32055 +3
==========================================
- Hits 21404 21387 -17
- Misses 9374 9387 +13
- Partials 1274 1281 +7 ☔ View full report in Codecov by Sentry. |
api/v1alpha1/healthcheck_types.go
Outdated
@@ -9,6 +9,11 @@ import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | |||
|
|||
// HealthCheck configuration to decide which endpoints | |||
// are healthy and can be used for routing. | |||
// | |||
// Please note that Envoy load balancer may behave differently when lots of endpoints are unhealthy because of the "panic mode". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note: Once the overall health of the backendRef drops below 50% (e.g. a backendRef having 10 endpoints
// with more than 5 unhealthy endpoints), the health check is ignored for the remaining endpoints i.e. they are not
// removed from the load balancing pool. This prevents cascading failures and retry storms in the distributed system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the health check is ignored for the remaining endpoints i.e. they are not
removed from the load balancing pool.
This description may not be accurate, I've verified the following behavior in my test culster:
When the percentage of the unhealth endpoints reach 50%, the healtch check will not be ignored - the unhealth endpoints will still be marked as unhealth, however, the load balancer will distribute requests across all the endpoints, including both the unhealth and the health endpoints.
I guess the reason is to prevent a lot of requests from being sent to a small number of remaining health endpoint, which may overwhelming these health endpoinds and tear town the wole culster.
Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing Zhao <[email protected]>
Explain the "panic mode" where failed endpoints exceeds 50% as users were asking why requests were sent to unhealth endpoints.