Periodic Memcached & Postgres disconnection in k8s pods #228

t3hmrman · 2019-12-27T08:20:09Z

Two errors occurred in memcached and postgres which were reported over email:

Logger django.pylibmc, from module django_pylibmc.memcached line 132:
Error generated by Anonymous user (not logged in) on zulip-74dff868dc-xw7wh deployment

Traceback (most recent call last):
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django_pylibmc/memcached.py", line 130, in get
    return super(PyLibMCCache, self).get(key, default, version)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/core/cache/backends/memcached.py", line 79, in get
    val = self._cache.get(key)
pylibmc.ConnectionError: error 3 from memcached_get(:1:6f2ebcbba6130c440f61a24276a10): (0x2504060) CONNECTION FAILURE(Connection reset by peer),  host: zulip-memcached:11211 -> libmemcached/io.cc:466


Deployed code:
- ZULIP_VERSION: 2.1.1
- version: docker


Request info: none



Logger root, from module zerver.worker.queue_processors line 152:
Error generated by Anonymous user (not logged in) on zulip-74dff868dc-xw7wh deployment

Traceback (most recent call last):
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 31, in execute
    return wrapper_execute(self, super().execute, query, vars)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 18, in wrapper_execute
    return action(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/worker/queue_processors.py", line 135, in consume_wrapper
    self.consume(data)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/worker/queue_processors.py", line 311, in consume
    user_profile = get_user_profile_by_id(event["user_profile_id"])
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/cache.py", line 168, in func_with_caching
    val = func(*args, **kwargs)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/models.py", line 2074, in get_user_profile_by_id
    return UserProfile.objects.select_related().get(id=uid)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 374, in get
    num = len(clone)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 232, in __len__
    self._fetch_all()
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 1121, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 53, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 889, in execute_sql
    cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 31, in execute
    return wrapper_execute(self, super().execute, query, vars)
  File "/home/zulip/deployments/2019-12-14-03-04-44/zerver/lib/db.py", line 18, in wrapper_execute
    return action(sql, params)
django.db.utils.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.



Deployed code:
- ZULIP_VERSION: 2.1.1
- version: docker


Request info: none

While it's not clear why the disconnection happened, but chat is still working properly and messages are persisted (this may be related to #225 and the new version is reporting more errors than before)

The text was updated successfully, but these errors were encountered:

t3hmrman · 2019-12-27T09:38:44Z

I want to add that this is intermittent -- I get the error every once in a while so I assume that zulip is reconnecting as necessary.

timabbott · 2019-12-30T18:38:41Z

My guess is the root cause is the same issue explained here:

https://zulip.readthedocs.io/en/latest/production/troubleshooting.html#disabling-unattended-upgrades

I.e. a service or container is being restarted, and then the errors related to that restart breaking connections are spread over time due to Zulip's recovery logic being per-process.

stratosgear · 2020-03-31T08:52:08Z

Just came here to say I have the same issues, but I am running the docker-zulip installation, which I think does not auto-update the individual containers. And I'm getting way to many emails (>20-30 per day) while the docker swarm is stable with no restarts...

timabbott · 2020-04-03T20:01:15Z

Are they specifically the sort of postgres/memcached connection failure tracebacks shown above?

stratosgear · 2020-04-04T11:26:31Z

OK, opened #426 and zulip/zulip#14456 to avoid any further confusion.

t3hmrman changed the title ~~Memcached & Postgres disconnection~~ Periodic Memcached & Postgres disconnection in k8s pods Dec 27, 2019

timabbott closed this as completed Dec 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodic Memcached & Postgres disconnection in k8s pods #228

Periodic Memcached & Postgres disconnection in k8s pods #228

t3hmrman commented Dec 27, 2019 •

edited

Loading

t3hmrman commented Dec 27, 2019

timabbott commented Dec 30, 2019

stratosgear commented Mar 31, 2020

timabbott commented Apr 3, 2020

stratosgear commented Apr 4, 2020 •

edited

Loading

Periodic Memcached & Postgres disconnection in k8s pods #228

Periodic Memcached & Postgres disconnection in k8s pods #228

Comments

t3hmrman commented Dec 27, 2019 • edited Loading

t3hmrman commented Dec 27, 2019

timabbott commented Dec 30, 2019

stratosgear commented Mar 31, 2020

timabbott commented Apr 3, 2020

stratosgear commented Apr 4, 2020 • edited Loading

t3hmrman commented Dec 27, 2019 •

edited

Loading

stratosgear commented Apr 4, 2020 •

edited

Loading