Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memcache error #737

Closed
mcrot opened this issue Oct 19, 2021 · 18 comments
Closed

Memcache error #737

mcrot opened this issue Oct 19, 2021 · 18 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@mcrot
Copy link
Collaborator

mcrot commented Oct 19, 2021

Error message in production in some analyses results for the large surface UNCD:

Traceback (most recent call last):
  File "/app/topobank/taskapp/tasks.py", line 131, in perform_analysis
    result = analysis.function.eval(subject, **kwargs)
  File "/app/topobank/analysis/models.py", line 288, in eval
    return self.get_implementation(subject_type).eval(subject, **kwargs)
  File "/app/topobank/analysis/models.py", line 340, in eval
    return pyfunc(subject, **kwargs)
  File "/app/topobank/analysis/functions.py", line 674, in autocorrelation_for_surface
    nb_points_per_decade=nb_points_per_decade)
  File "/app/topobank/analysis/functions.py", line 577, in analysis_function_for_surface
    r, A = log_average(topographies, funcname_profile, unit, progress_callback=progress_callback, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/SurfaceTopography/Container/Averaging.py", line 65, in log_average
    progress_callback(i, len(self))
  File "/app/topobank/analysis/functions.py", line 574, in <lambda>
    progress_callback = None if progress_recorder is None else lambda i, n: progress_recorder.set_progress(i + 1, n)
  File "/opt/conda/lib/python3.7/site-packages/celery_progress/backend.py", line 48, in set_progress
    meta=meta
  File "/opt/conda/lib/python3.7/site-packages/celery/app/task.py", line 972, in update_state
    task_id, meta, state, request=self.request, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 483, in store_result
    request=request, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 903, in _store_result
    current_meta = self._get_task_meta_for(task_id)
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 925, in _get_task_meta_for
    meta = self.get(self.get_key_for_task(task_id))
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/cache.py", line 120, in get
    return self.client.get(key)
pylibmc.Error: error 15 from memcached_get(celery-task-meta-41098c7d-47e0-4)

Not sure yet, what error 15 means. Happened during high load on the Celery workers.

@mcrot mcrot added the bug Something isn't working label Oct 19, 2021
@mcrot mcrot added this to the v0.16.1 milestone Oct 19, 2021
@mcrot mcrot self-assigned this Oct 19, 2021
@mcrot
Copy link
Collaborator Author

mcrot commented Oct 19, 2021

I'll try to recalculate all failed when then the heavy workload is over. Maybe this is related to limited resources and will go away when we move the workers to a more powerful machine.

@mcrot
Copy link
Collaborator Author

mcrot commented Oct 22, 2021

For the published surface 944 "error 21" happened:

Analysis failed for subject 100k_sample10_FIXEDv2p1.asc. Traceback:

Traceback (most recent call last):
  File "/app/topobank/taskapp/tasks.py", line 131, in perform_analysis
    result = analysis.function.eval(subject, **kwargs)
  File "/app/topobank/analysis/models.py", line 288, in eval
    return self.get_implementation(subject_type).eval(subject, **kwargs)
  File "/app/topobank/analysis/models.py", line 340, in eval
    return pyfunc(subject, **kwargs)
  File "/app/topobank/analysis/functions.py", line 847, in scale_dependent_slope
    nb_points_per_decade=nb_points_per_decade)
  File "/app/topobank/analysis/functions.py", line 766, in scale_dependent_roughness_parameter
    process_series_reliable_unreliable(xname, x_kwargs, is_reliable_visible=True)
  File "/app/topobank/analysis/functions.py", line 744, in process_series_reliable_unreliable
    distances, rms_values_sq = topography.scale_dependent_statistical_property(**func_kwargs)
  File "/opt/conda/lib/python3.7/site-packages/SurfaceTopography/HeightContainer.py", line 89, in func
    return self._functions[name](self, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/SurfaceTopography/Generic/ScaleDependentStatistics.py", line 129, in scale_dependent_statistical_property
    progress_callback=progress_callback)
  File "/opt/conda/lib/python3.7/site-packages/SurfaceTopography/HeightContainer.py", line 89, in func
    return self._functions[name](self, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/SurfaceTopography/Uniform/Derivative.py", line 294, in derivative
    progress_callback(i * nb_scale_factors + j, len(operators) * nb_scale_factors)
  File "/app/topobank/analysis/functions.py", line 760, in <lambda>
    lambda i, n: progress_recorder.set_progress(i + 1, fac * n)
  File "/opt/conda/lib/python3.7/site-packages/celery_progress/backend.py", line 48, in set_progress
    meta=meta
  File "/opt/conda/lib/python3.7/site-packages/celery/app/task.py", line 972, in update_state
    task_id, meta, state, request=self.request, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 483, in store_result
    request=request, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 909, in _store_result
    self._set_with_state(self.get_key_for_task(task_id), self.encode(meta), state)
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 786, in _set_with_state
    return self.set(key, value)
  File "/opt/conda/lib/python3.7/site-packages/celery/backends/cache.py", line 126, in set
    return self.client.set(key, value, self.expires)
pylibmc.Error: error 21 from memcached_set

but also error 15.

@mcrot
Copy link
Collaborator Author

mcrot commented Oct 22, 2021

This seems to be a memory issue, because if retrigger single analyses after the heavy workload as ended, everything is fine, no more error. So I'll shift this for checking later and we need to start the workers of version 0.16.1 on extended resources.

@mcrot mcrot modified the milestones: v0.16.1, v0.17.0 Oct 22, 2021
@pastewka
Copy link
Contributor

This still happens, even for quite small topographies. Not sure if it is actually a memory error. It also only appears for the SDRP.

image

@mcrot
Copy link
Collaborator Author

mcrot commented Oct 28, 2021

Maybe this is related to this issue: antonagestam/collectfast#103

We could try to replace pylibmc with python-memcached. The latter seems to be thread-safe and the first not.
On the other hand python-memcached has not been worked on since four years and is less sophisticated than pylibmc which is based on python-memcached. I'll have to investigate further.

Another idea: In all examples here, the error happens while handling the progress meter.
The progress meter handles are somehow called with the wrong arguments, see zulip/zulip#755 , they run backwards
for SDRP. So maybe there is some recursion or similar. I'll look at that first, it looks more promising.

@pastewka
Copy link
Contributor

I don't understand why thread safety is an issue. Django is not multi threaded (in synchronous mode st least) if I understand correctly.

@mcrot
Copy link
Collaborator Author

mcrot commented Oct 28, 2021

Yes. The tasks run under Celery which may handle things differently. However I also don't think this is an issue, I guess this is related to the progress meter (#755). I'll have a look now.

@mcrot mcrot modified the milestones: v0.17.0, v0.18.0 Dec 1, 2021
@mcrot
Copy link
Collaborator Author

mcrot commented Dec 6, 2021

Indeed, this is not a memory error. It sometimes happens again on the new machine with significantly larger memory and also for a small measurement. It does not happen deterministically - if I restart related analyses it worked (so far) in two cases, in both the jumping progress meter (#755) was involved.
The issue zulip/zulip#755 will be fixed with PR zulip/zulip#758, let's check this again when this is online.

@mcrot
Copy link
Collaborator Author

mcrot commented Jan 10, 2022

Currently this happens again and again and makes some analyses fail.
I suggest to release an intermediate version without DOI in order to test whether zulip/zulip#758 fixes this in production.

@pastewka
Copy link
Contributor

Sounds good, please release

@mcrot
Copy link
Collaborator Author

mcrot commented Jan 14, 2022

Unfortunately, after installing 0.18, there are still problems causing a memcache error:

2022-01-14T15:38:40.373485066Z  2022-01-14 16:38:40,371 - bebad16a-8fad-48b2-8084-10534e93e76d - topobank.taskapp.tasks.perform_analysis - topobank.taskapp.tasks - WARNING - Exception while performing analysis 288876 (compatible? False): error 3 from memcached_get(celery-task-meta-bebad16a-8fad-4): (0x5641e839adb0) CONNECTION FAILURE(Connection reset by peer),  host: memcached:11211 -> libmemcached/io.cc:466
2022-01-14T15:38:40.407205734Z  [2022-01-14 16:38:40,402: ERROR/ForkPoolWorker-13] Task topobank.taskapp.tasks.perform_analysis[bebad16a-8fad-48b2-8084-10534e93e76d] raised unexpected: InterfaceError('connection already closed')
2022-01-14T15:38:40.407293852Z  Traceback (most recent call last):
2022-01-14T15:38:40.407327641Z    File "/app/topobank/taskapp/tasks.py", line 131, in perform_analysis
2022-01-14T15:38:40.407339497Z      result = analysis.function.eval(subject, **kwargs)
2022-01-14T15:38:40.407347584Z    File "/app/topobank/analysis/models.py", line 288, in eval
2022-01-14T15:38:40.407355234Z      return self.get_implementation(subject_type).eval(subject, **kwargs)
2022-01-14T15:38:40.407363483Z    File "/app/topobank/analysis/models.py", line 340, in eval
2022-01-14T15:38:40.407371530Z      return pyfunc(subject, **kwargs)
2022-01-14T15:38:40.407401724Z    File "/app/topobank/analysis/functions.py", line 1188, in contact_mechanics
2022-01-14T15:38:40.407412712Z      progress_recorder.set_progress(i + 1, nsteps)
2022-01-14T15:38:40.407419899Z    File "/opt/conda/lib/python3.7/site-packages/celery_progress/backend.py", line 48, in set_progress
2022-01-14T15:38:40.407443844Z      meta=meta
2022-01-14T15:38:40.407455886Z    File "/opt/conda/lib/python3.7/site-packages/celery/app/task.py", line 977, in update_state
2022-01-14T15:38:40.407463795Z      task_id, meta, state, request=self.request, **kwargs)
2022-01-14T15:38:40.407470909Z    File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 529, in store_result
2022-01-14T15:38:40.407478509Z      request=request, **kwargs)
2022-01-14T15:38:40.407485681Z    File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 956, in _store_result
2022-01-14T15:38:40.407526910Z      current_meta = self._get_task_meta_for(task_id)
2022-01-14T15:38:40.407535698Z    File "/opt/conda/lib/python3.7/site-packages/celery/backends/base.py", line 978, in _get_task_meta_for
2022-01-14T15:38:40.407543194Z      meta = self.get(self.get_key_for_task(task_id))
2022-01-14T15:38:40.407550520Z    File "/opt/conda/lib/python3.7/site-packages/celery/backends/cache.py", line 120, in get
2022-01-14T15:38:40.407566091Z      return self.client.get(key)
2022-01-14T15:38:40.407587639Z  pylibmc.ConnectionError: error 3 from memcached_get(celery-task-meta-bebad16a-8fad-4): (0x5641e839adb0) CONNECTION FAILURE(Connection reset by peer),  host: memcached:11211 -> libmemcached/io.cc:466
2022-01-14T15:38:40.407597476Z  
2022-01-14T15:38:40.407770114Z  During handling of the above exception, another exception occurred:
2022-01-14T15:38:40.407793010Z  
2022-01-14T15:38:40.407800697Z  Traceback (most recent call last):
2022-01-14T15:38:40.407807979Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
2022-01-14T15:38:40.407815022Z      return self.cursor.execute(sql, params)
2022-01-14T15:38:40.407832817Z  psycopg2.OperationalError: server closed the connection unexpectedly
2022-01-14T15:38:40.407841787Z  	This probably means the server terminated abnormally
2022-01-14T15:38:40.407849069Z  	before or while processing the request.
2022-01-14T15:38:40.407856819Z  
2022-01-14T15:38:40.407864330Z  
2022-01-14T15:38:40.407871446Z  The above exception was the direct cause of the following exception:
2022-01-14T15:38:40.407879266Z  
2022-01-14T15:38:40.407907379Z  Traceback (most recent call last):
2022-01-14T15:38:40.407918003Z    File "/app/topobank/taskapp/tasks.py", line 145, in perform_analysis
2022-01-14T15:38:40.407925990Z      Analysis.FAILURE)
2022-01-14T15:38:40.407932675Z    File "/app/topobank/taskapp/tasks.py", line 117, in save_result
2022-01-14T15:38:40.407939673Z      analysis.save()
2022-01-14T15:38:40.407946081Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/base.py", line 740, in save
2022-01-14T15:38:40.407953857Z      force_update=force_update, update_fields=update_fields)
2022-01-14T15:38:40.407968167Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/base.py", line 778, in save_base
2022-01-14T15:38:40.407975767Z      force_update, using, update_fields,
2022-01-14T15:38:40.407982224Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/base.py", line 859, in _save_table
2022-01-14T15:38:40.407989349Z      forced_update)
2022-01-14T15:38:40.407995961Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/base.py", line 912, in _do_update
2022-01-14T15:38:40.408003631Z      return filtered._update(values) > 0
2022-01-14T15:38:40.408017434Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/query.py", line 802, in _update
2022-01-14T15:38:40.408025762Z      return query.get_compiler(self.db).execute_sql(CURSOR)
2022-01-14T15:38:40.408033361Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1559, in execute_sql
2022-01-14T15:38:40.408040334Z      cursor = super().execute_sql(result_type)
2022-01-14T15:38:40.408046816Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
2022-01-14T15:38:40.408053967Z      cursor.execute(sql, params)
2022-01-14T15:38:40.408060462Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/utils.py", line 66, in execute
2022-01-14T15:38:40.408090198Z      return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
2022-01-14T15:38:40.408099418Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
2022-01-14T15:38:40.408106987Z      return executor(sql, params, many, context)
2022-01-14T15:38:40.408119288Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
2022-01-14T15:38:40.408127656Z      return self.cursor.execute(sql, params)
2022-01-14T15:38:40.408135135Z    File "/opt/conda/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
2022-01-14T15:38:40.408141838Z      raise dj_exc_value.with_traceback(traceback) from exc_value
2022-01-14T15:38:40.408156492Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
2022-01-14T15:38:40.408165422Z      return self.cursor.execute(sql, params)
2022-01-14T15:38:40.408171687Z  django.db.utils.OperationalError: server closed the connection unexpectedly
2022-01-14T15:38:40.408184221Z  	This probably means the server terminated abnormally
2022-01-14T15:38:40.408192096Z  	before or while processing the request.
2022-01-14T15:38:40.408198584Z  
2022-01-14T15:38:40.408204340Z  
2022-01-14T15:38:40.408216594Z  During handling of the above exception, another exception occurred:
2022-01-14T15:38:40.408224052Z  
2022-01-14T15:38:40.408230655Z  Traceback (most recent call last):
2022-01-14T15:38:40.408237207Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/base/base.py", line 237, in _cursor
2022-01-14T15:38:40.408250049Z      return self._prepare_cursor(self.create_cursor(name))
2022-01-14T15:38:40.408257988Z    File "/opt/conda/lib/python3.7/site-packages/django/utils/asyncio.py", line 33, in inner
2022-01-14T15:38:40.408264670Z      return func(*args, **kwargs)
2022-01-14T15:38:40.408270537Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
2022-01-14T15:38:40.408282805Z      cursor = self.connection.cursor()
2022-01-14T15:38:40.408290280Z  psycopg2.InterfaceError: connection already closed
2022-01-14T15:38:40.408296445Z  
2022-01-14T15:38:40.408301989Z  The above exception was the direct cause of the following exception:
2022-01-14T15:38:40.408308142Z  
2022-01-14T15:38:40.408328829Z  Traceback (most recent call last):
2022-01-14T15:38:40.408344774Z    File "/opt/conda/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
2022-01-14T15:38:40.408352255Z      R = retval = fun(*args, **kwargs)
2022-01-14T15:38:40.408358417Z    File "/opt/conda/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
2022-01-14T15:38:40.408364412Z      return self.run(*args, **kwargs)
2022-01-14T15:38:40.408388479Z    File "/app/topobank/taskapp/tasks.py", line 152, in perform_analysis
2022-01-14T15:38:40.408396731Z      analysis = Analysis.objects.get(id=analysis_id)
2022-01-14T15:38:40.408409257Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/manager.py", line 85, in manager_method
2022-01-14T15:38:40.408417098Z      return getattr(self.get_queryset(), name)(*args, **kwargs)
2022-01-14T15:38:40.408423407Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/query.py", line 431, in get
2022-01-14T15:38:40.408430071Z      num = len(clone)
2022-01-14T15:38:40.408436125Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/query.py", line 262, in __len__
2022-01-14T15:38:40.408442362Z      self._fetch_all()
2022-01-14T15:38:40.408454212Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/query.py", line 1324, in _fetch_all
2022-01-14T15:38:40.408462245Z      self._result_cache = list(self._iterable_class(self))
2022-01-14T15:38:40.408468359Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/query.py", line 51, in __iter__
2022-01-14T15:38:40.408477159Z      results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
2022-01-14T15:38:40.408487503Z    File "/opt/conda/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
2022-01-14T15:38:40.408493562Z      cursor = self.connection.cursor()
2022-01-14T15:38:40.408498942Z    File "/opt/conda/lib/python3.7/site-packages/django/utils/asyncio.py", line 33, in inner
2022-01-14T15:38:40.408504322Z      return func(*args, **kwargs)
2022-01-14T15:38:40.408521537Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/base/base.py", line 259, in cursor
2022-01-14T15:38:40.408532114Z      return self._cursor()
2022-01-14T15:38:40.408538045Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/base/base.py", line 237, in _cursor
2022-01-14T15:38:40.408543330Z      return self._prepare_cursor(self.create_cursor(name))
2022-01-14T15:38:40.408548500Z    File "/opt/conda/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
2022-01-14T15:38:40.408554233Z      raise dj_exc_value.with_traceback(traceback) from exc_value
2022-01-14T15:38:40.408564934Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/base/base.py", line 237, in _cursor
2022-01-14T15:38:40.408571933Z      return self._prepare_cursor(self.create_cursor(name))
2022-01-14T15:38:40.408576967Z    File "/opt/conda/lib/python3.7/site-packages/django/utils/asyncio.py", line 33, in inner
2022-01-14T15:38:40.408582006Z      return func(*args, **kwargs)
2022-01-14T15:38:40.408587145Z    File "/opt/conda/lib/python3.7/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
2022-01-14T15:38:40.408592090Z      cursor = self.connection.cursor()

Shifting to 0.19.

@mcrot mcrot modified the milestones: v0.18.0, v0.19.0 Jan 14, 2022
@mcrot
Copy link
Collaborator Author

mcrot commented Jan 21, 2022

This is probably related, similar issues appeared in the Zulip project:

zulip/zulip#14456
zulip/docker-zulip#426
zulip/zulip#10776

This a maybe a blueprint for a fix:
zulip/zulip@b312001

@pastewka
Copy link
Contributor

From here: moby/moby#31208

Our workaround is to set the database service to endpoint mode dnsrr and to disable the Netty HTTP connection pooling. (The database will not be a Swarm service in production anyway.)

@pastewka
Copy link
Contributor

Also here: vapor/postgres-kit#164 (comment)

db:
  image: postgres:12.4
  sysctls:
    # NOTES: these values are needed here because docker swarm kills long running idle 
    # connections by default after 15 minutes see https://github.com/moby/moby/issues/31208
    # info about these values are here https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
    - net.ipv4.tcp_keepalive_intvl=600
    - net.ipv4.tcp_keepalive_probes=9
    - net.ipv4.tcp_keepalive_time=600

@mcrot
Copy link
Collaborator Author

mcrot commented Jan 21, 2022

Thanks. I'm already using this endmode, at least for rabbitmq and memcached, but not yet for the celeryworker (there were problems), I will try again.

I will definitely try the keepalive params, this looks promising!

@mcrot mcrot modified the milestones: v0.19.0, 0.18.2 Jan 26, 2022
@mcrot mcrot closed this as completed in f3d1935 Jan 26, 2022
@mcrot
Copy link
Collaborator Author

mcrot commented Jan 28, 2022

Although I'm using the keepalive parameters now, this still happens, at least when I try to generate thumbnails for the largest topography (8k x 8k):

2022-01-28T09:22:13.277798263Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    | [2022-01-28 10:22:13,276: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
2022-01-28T09:22:13.277866126Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    | Traceback (most recent call last):
2022-01-28T09:22:13.277880356Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 326, in start
2022-01-28T09:22:13.277890431Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     blueprint.start(self)
2022-01-28T09:22:13.277899439Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/celery/bootsteps.py", line 116, in start
2022-01-28T09:22:13.277908193Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     step.start(parent)
2022-01-28T09:22:13.277916646Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 618, in start
2022-01-28T09:22:13.277925586Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     c.loop(*c.loop_args())
2022-01-28T09:22:13.277933882Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/celery/worker/loops.py", line 97, in asynloop
2022-01-28T09:22:13.277942465Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     next(loop)
2022-01-28T09:22:13.277950791Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/kombu/asynchronous/hub.py", line 362, in create_loop
2022-01-28T09:22:13.277959475Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     cb(*cbargs)
2022-01-28T09:22:13.277967710Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/kombu/transport/base.py", line 235, in on_readable
2022-01-28T09:22:13.277977172Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     reader(loop)
2022-01-28T09:22:13.277985477Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/kombu/transport/base.py", line 217, in _read
2022-01-28T09:22:13.277994017Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     drain_events(timeout=0)
2022-01-28T09:22:13.278003115Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/connection.py", line 522, in drain_events
2022-01-28T09:22:13.278011611Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     while not self.blocking_read(timeout):
2022-01-28T09:22:13.278021170Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/connection.py", line 528, in blocking_read
2022-01-28T09:22:13.278030639Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     return self.on_inbound_frame(frame)
2022-01-28T09:22:13.278038901Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/method_framing.py", line 53, in on_frame
2022-01-28T09:22:13.278048230Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     callback(channel, method_sig, buf, None)
2022-01-28T09:22:13.278070183Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/connection.py", line 535, in on_inbound_method
2022-01-28T09:22:13.278079117Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     method_sig, payload, content,
2022-01-28T09:22:13.278087273Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/abstract_channel.py", line 143, in dispatch_method
2022-01-28T09:22:13.278096193Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     listener(*args)
2022-01-28T09:22:13.278104179Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/connection.py", line 663, in _on_close
2022-01-28T09:22:13.278112671Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     self._x_close_ok()
2022-01-28T09:22:13.278120807Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/connection.py", line 678, in _x_close_ok
2022-01-28T09:22:13.278129610Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     self.send_method(spec.Connection.CloseOk, callback=self._on_close_ok)
2022-01-28T09:22:13.278189949Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/abstract_channel.py", line 57, in send_method
2022-01-28T09:22:13.278203674Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     conn.frame_writer(1, self.channel_id, sig, args, content)
2022-01-28T09:22:13.278211292Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/method_framing.py", line 183, in write_frame
2022-01-28T09:22:13.278219033Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     write(view[:offset])
2022-01-28T09:22:13.278226016Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |   File "/opt/conda/lib/python3.7/site-packages/amqp/transport.py", line 363, in write
2022-01-28T09:22:13.278233467Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    |     self._write(s)
2022-01-28T09:22:13.278241385Z prodstack_celeryworker.1.s9mqw4xgrvtn@analyses    | ConnectionResetError: [Errno 104] Connection reset by peer

Maybe related to zulip/zulip#534 .

@mcrot mcrot reopened this Jan 28, 2022
@mcrot mcrot modified the milestones: 0.18.2, v0.19.0 Jan 28, 2022
@pastewka
Copy link
Contributor

We should remove Selenium. Do you need help doing that?

@mcrot mcrot modified the milestones: v0.19.0, v.0.20.0 Mar 23, 2022
@mcrot
Copy link
Collaborator Author

mcrot commented Mar 30, 2022

Memcache and RabbitMQ have been replaced by Redis.
Probably the main problem was a wrong configuration of supervisord which led to continuous restarts of memcached and rabbitmq. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants