Max iters per instance + enqueue after shutdown #99
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm opening this up as a draft PR just to get some feedback as these changes may not be aligned with the mainline goals of this gem, if so I will maintain this fork myself and use it for my own purposes. I don't believe these changes are ready to be merged in yet without adding unit test coverage, but I would like to see if there's interest in adding these things in before I bother writing any.
Summary of the change
I wanted to achieve two goals:
on_shutdown
hook has completed.max_iters_per_run
in the class body.Reasons Why I Made this Change
I needed the following changes to be made for my use case of exporting data to a CSV file for a Heroku application. Since Heroku file storage is ephemeral my strategy was to buffer lines of the CSV and flush the bytes to an Azure Blob Storage file to append them. At a high level my job does the following:
each_iteration
on_shutdown
callback it flushes the bytes to the storage blob with an API call.My reasoning for why I'm flushing bytes in
on_shutdown
is that I would like each run of the job to be a single unit of work. Think of flushing the bytes as committing that unit of work to the file. If there's an error in one of the iterations I wouldn't want the job's view of how many rows have been flushed to the file to become mismatched with where it is in the enumerator.Therefore if there's any error generating rows in the CSV, or a network error while flushing the bytes the job can safely restart at the same cursor value and try regenerating and flushing those same bytes as well. Yes this isn't a perfect guarantee of success, but I check for a content length mismatch in
build_enumerator
and raise an error if this happens.