-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zookeeper] Server error causing client connection issue #2129
Comments
Encountered another Zookeeper server error which caused submission errors.
|
Another occurrence at 2024-12-15 17:36 This time it happened during an Escholarship submission, which has very large Zookeeper payload data, (up to 50K) as well as unicode data (maybe corrupt) Submitted 50 Eschol objects to Stage, but did not replicate problem. Here are stack traces from a) client b) ZK worker c) ZK leader a)
b)
c)
|
Another Zookeeper error during an Scholarship submission.
|
The following documentation from the Zookeeper admin page informs that Zookeeper is designed to handle Kilobytes of data payload. If size was an issue, we would not be receiving the socket errors.
|
The Zookeeper error (noted above) stems from Worker 4 unable to read data from Worker 3
The workers are distributed across 3 AZs. This is to increase reliability. Would it be easier if all Zookeeper nodes were in the same AZ? Does this propose a realistic risk? |
Running on Zookeeper worker 05 is a logger that will snapshot "srvr" info on all 5 workers every 15 minutes. This info will show:
Reminder: Kill process after holiday break. It is located at: |
On 12/03 at 02:05:30 uc3-mrtzk-prd05, which was the leader encountered a write error. This triggered an error on all peers and eventually caused Merritt Ingest client side errors.
Here is the stack trace on worker 05
Lets monitor for now.
The text was updated successfully, but these errors were encountered: