-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: envoy initial fetch time out when CDS updated. #1035
Comments
rewatch
when receive delta watch request
Hey, thanks for opening the issue. I believe this is the same underlying issue as #1001, which has now been addressed in envoy. Can you confirm if you are still encountering this issue when using envoy 1.32.x or activating the runtime flag mentioned in the documentation? |
Thank you so much! @valerian-roche for the background catch up. I'm using v1.30.6. I'll try the solution you mentioned.
By the way, I fully agree with you that we should not make any changes that would tie control-plane with envoy. I think the patch would introduce only a few changes, to make control-plane work with old envoy versions without affect other xDS clients. This enables users to adopt control-plane solutions like envoy-gateway for Kubernetes without changing data plane version. It would be great if you would reconsider with it. Anyway, thank you again for taking time😊 |
Given that envoy has now defaulted the fix I feel quite strongly that this issue does not justify this abstraction leakage. Users of envoy-gateway do not have to update the data-plane version, as they can simply activate the runtime flag to address the issue is envoy < 1.32. All supported version of envoy have a functional implementation of the EDS cache. |
@alecholmez FYI I discussed PR #1034 here. Imo as envoy fixed it in 1.32 as default we can keep the control-plane simpler |
I think this could still be an issue. If a user sets
|
I am unclear why a user would set this value as this breaks the EDS behavior of envoy in such case. Can you clarify what use-case is expected to use this? |
@valerian-roche When initial_fetch_timeout sets to 0 , there's no timeout. If CDS is updated without any further EDS changes, envoy hangs in cluster initializing state, the caches are not used. |
I know set initial_fetch_timeout to a short time may solve this case. But I think it is a walk round. As technical engineers, we should try to solve the problem from the root. |
The case of |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Server should send one more EDS response when any CDS update's ACK is received.
refs: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#xds-ack-nack
In my test , after CDS updated in controller, there's no EDS sent, hence the cluster warming time out.
As I highlighted my logs below:
The EDS update responses are sent during 08:17:55 - 08:18:46 before the CDS update (nonce=10) is ACKed by envoy.
At 08:21:46 (I set my envoy initial fetch timeout to 3min) client side printed initial fetch timeout. because after last CDS updates, there's no EDS response arrived.
PS:
I'm using envoy gateway in my environment.
some logs are added by my debug version here:
The text was updated successfully, but these errors were encountered: