Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic Memcached & Postgres disconnection in k8s pods #228

Closed
t3hmrman opened this issue Dec 27, 2019 · 5 comments
Closed

Periodic Memcached & Postgres disconnection in k8s pods #228

t3hmrman opened this issue Dec 27, 2019 · 5 comments

Comments

@t3hmrman
Copy link

t3hmrman commented Dec 27, 2019

Two errors occurred in memcached and postgres which were reported over email:

Logger django.pylibmc, from module django_pylibmc.memcached line 132:
Error generated by Anonymous user (not logged in) on zulip-74dff868dc-xw7wh deployment

Traceback (most recent call last):
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django_pylibmc/memcached.py", line 130, in get
    return super(PyLibMCCache, self).get(key, default, version)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/core/cache/backends/memcached.py", line 79, in get
    val = self._cache.get(key)
pylibmc.ConnectionError: error 3 from memcached_get(:1:6f2ebcbba6130c440f61a24276a10): (0x2504060) CONNECTION FAILURE(Connection reset by peer),  host: zulip-memcached:11211 -> libmemcached/io.cc:466


Deployed code:
- ZULIP_VERSION: 2.1.1
- version: docker


Request info: none


Logger root, from module zerver.worker.queue_processors line 152:
Error generated by Anonymous user (not logged in) on zulip-74dff868dc-xw7wh deployment

Traceback (most recent call last):
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 31, in execute
    return wrapper_execute(self, super().execute, query, vars)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 18, in wrapper_execute
    return action(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/worker/queue_processors.py", line 135, in consume_wrapper
    self.consume(data)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/worker/queue_processors.py", line 311, in consume
    user_profile = get_user_profile_by_id(event["user_profile_id"])
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/cache.py", line 168, in func_with_caching
    val = func(*args, **kwargs)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/models.py", line 2074, in get_user_profile_by_id
    return UserProfile.objects.select_related().get(id=uid)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 374, in get
    num = len(clone)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 232, in __len__
    self._fetch_all()
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 1121, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 53, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 889, in execute_sql
    cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 31, in execute
    return wrapper_execute(self, super().execute, query, vars)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 18, in wrapper_execute
    return action(sql, params)
django.db.utils.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.



Deployed code:
- ZULIP_VERSION: 2.1.1
- version: docker


Request info: none

While it's not clear why the disconnection happened, but chat is still working properly and messages are persisted (this may be related to #225 and the new version is reporting more errors than before)

@t3hmrman
Copy link
Author

I want to add that this is intermittent -- I get the error every once in a while so I assume that zulip is reconnecting as necessary.

@t3hmrman t3hmrman changed the title Memcached & Postgres disconnection Periodic Memcached & Postgres disconnection in k8s pods Dec 27, 2019
@timabbott
Copy link
Member

My guess is the root cause is the same issue explained here:

https://zulip.readthedocs.io/en/latest/production/troubleshooting.html#disabling-unattended-upgrades

I.e. a service or container is being restarted, and then the errors related to that restart breaking connections are spread over time due to Zulip's recovery logic being per-process.

@stratosgear
Copy link

Just came here to say I have the same issues, but I am running the docker-zulip installation, which I think does not auto-update the individual containers. And I'm getting way to many emails (>20-30 per day) while the docker swarm is stable with no restarts...

@timabbott
Copy link
Member

Are they specifically the sort of postgres/memcached connection failure tracebacks shown above?

@stratosgear
Copy link

stratosgear commented Apr 4, 2020

OK, opened #426 and zulip/zulip#14456 to avoid any further confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants