Max iters per instance + enqueue after shutdown #99

GabrielAlacchi · 2021-05-29T18:51:00Z

I'm opening this up as a draft PR just to get some feedback as these changes may not be aligned with the mainline goals of this gem, if so I will maintain this fork myself and use it for my own purposes. I don't believe these changes are ready to be merged in yet without adding unit test coverage, but I would like to see if there's interest in adding these things in before I bother writing any.

Summary of the change

I wanted to achieve two goals:

Don't enqueue the next instance of the job until the on_shutdown hook has completed.
Allow setting a hard cap on the number of iterations that can run before interrupting the job. This is optional and can be set by calling max_iters_per_run in the class body.

Reasons Why I Made this Change

I needed the following changes to be made for my use case of exporting data to a CSV file for a Heroku application. Since Heroku file storage is ephemeral my strategy was to buffer lines of the CSV and flush the bytes to an Azure Blob Storage file to append them. At a high level my job does the following:

Creates a cursor which iterates over the models and preloads the necessary data
Iterates and creates a row for each model in each_iteration
In the on_shutdown callback it flushes the bytes to the storage blob with an API call.

My reasoning for why I'm flushing bytes in on_shutdown is that I would like each run of the job to be a single unit of work. Think of flushing the bytes as committing that unit of work to the file. If there's an error in one of the iterations I wouldn't want the job's view of how many rows have been flushed to the file to become mismatched with where it is in the enumerator.

Therefore if there's any error generating rows in the CSV, or a network error while flushing the bytes the job can safely restart at the same cursor value and try regenerating and flushing those same bytes as well. Yes this isn't a perfect guarantee of success, but I check for a content length mismatch in build_enumerator and raise an error if this happens.

GabrielAlacchi added 2 commits May 29, 2021 11:30

Max iters per instance + enqueue after shutdown

0663528

Merge branch 'Shopify:master' into master

c3bae4c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max iters per instance + enqueue after shutdown #99

Max iters per instance + enqueue after shutdown #99

GabrielAlacchi commented May 29, 2021

Max iters per instance + enqueue after shutdown #99

Are you sure you want to change the base?

Max iters per instance + enqueue after shutdown #99

Conversation

GabrielAlacchi commented May 29, 2021

Summary of the change

Reasons Why I Made this Change