Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This pattern vs splitting into multiple jobs? #242

Open
mollerhoj opened this issue Jun 30, 2022 · 2 comments
Open

This pattern vs splitting into multiple jobs? #242

mollerhoj opened this issue Jun 30, 2022 · 2 comments

Comments

@mollerhoj
Copy link

mollerhoj commented Jun 30, 2022

Perhaps something worth mentioning in the README / tutorials?

I'm not sure I understand the advantage of this pattern over splitting these long jobs into multiple small jobs?

Since the enumerator here uses a cursor, it should be rather simple to have each finished job start the next one, recursively? (Having each job take the cursor as an argument)
Am I missing something?

I guess one advantage is that we don't have the overhead of starting multiple jobs (unless there's an interruption). But this should be a minimal performance advantage - and one can process multiple records in a single job (batching) if desired (and thus have fewer jobs)..

In short, are the some advantages to this pattern that I'm missing?

@mollerhoj mollerhoj changed the title This pattern vs splitting into multiple sidekiq jobs? This pattern vs splitting into multiple jobs? Jul 4, 2022
@fatkodima
Copy link
Contributor

I described a few advantages here - https://github.com/fatkodima/sidekiq-iteration#faq

@Mangara
Copy link
Contributor

Mangara commented Jul 21, 2023

If you're thinking of these small jobs running serially I'm not sure I understand the difference - this is essentially what job-iteration does, except it does so automatically, adapting to the size of the workload, and it uses the same job_id for the entire duration, so that it is easier to track the work through logs / tracing.

If you're thinking of running the small jobs in parallel, that is indeed very different and has different trade-offs. The work will get done much faster, but also put that much more strain on underlying resources (DB, API calls, etc). If you process each iteration in its own job, there is a large amount of overhead involved with enqueuing potentially millions of jobs versus one job iterating that many times. If you batch iterations, it is tricky to find the right amount of parallelism.

In the end, the serial model job-iteration uses is both simple, and good enough for many use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants