Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JobIteration::DestroyAssociationJob #140

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bdewater
Copy link
Contributor

@bdewater bdewater commented Oct 30, 2021

Active Record 6.1 introduced the dependent: :destroy_async option for associations with a configurable job to asynchronously destroy related models. This PR ports this job to use the Iteration API, making it interruptible and resumable.

Open questions:

  • Instead of active_record_on_records, should this job use active_record_on_batches with a batch size of e.g. 100?
  • Should we add a Railtie to make the gem configure destroy_association_async_job automatically?

Closes #139

@bdewater bdewater force-pushed the destroy_association_job branch 2 times, most recently from 3c79b4a to 6d790be Compare October 30, 2021 22:11
def destroy
update!(deleted: true)
run_callbacks(:destroy)
run_callbacks(:commit)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not obvious for me why this was needed at first glance, figured I'd mention for reviewers: since rails/rails#41093 the before_destroy callback adds the job class and params to the parent model _after_commit_jobs array, then the after_commit callback actually enqueues the job.

@bdewater bdewater force-pushed the destroy_association_job branch 2 times, most recently from 4a97f97 to 3af3112 Compare October 31, 2021 00:40
Copy link

@tgwizard tgwizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice, thank you for building it!

One related question, not for this PR so sorry about that: This async destroy machinery still requires the association record IDs to be loaded at owner object deletion time, and included in the job params. If an owner is associated with tens or hundreds of thousands of records, that might not work. In Shop we have a few Shop Users with hundreds of thousands of orders, these can't all be loaded in a single query if they ever were to delete their account. It'd be ideal then to just pass the association_class + owner_model_name + owner_id, and load all data in this job.

Sorry for rant, PR LGTM.

end

def each_iteration(record, _params)
record.destroy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be destroy!? I believe that's not what's in the upstream Rails job, but otherwise some records might silently not be destroyed, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would get the job stuck if a single record cannot be deleted. It might be better to log the failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging for sure, perhaps even a AS::Notification topic to subscribe to so it can be hooked up to services like Bugsnag?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destroy! would halt the deletion of the remaining objects, raise an error (which would be reported to e.g. Bugsnag), and then retry. If we then fix the deletion validation bug, the retries would automatically proceed and no custom retry or maintenance job would be required. If we fix the deletion validation bug too late, after retries are exhausted, then a maintenance task needs to be run.

If we log or AS::Notification, then all services need to configure that to report to their Bugsnag equivalent, then the bug needs to be fixed, then some maintenance task needs to be run to delete what was missed.

Either way, we leave data in the DB that should be deleted. My preference is for destroy!, leaving less manual toil, but I don't mind too much.

destroy is what the Rails job does, so perhaps that's what should be done here. Perhaps it'd be even nicer to use the bang version on associations if the owner destroy was called with a bang.

@Mangara
Copy link
Contributor

Mangara commented Nov 1, 2021

Instead of active_record_on_records, should this job use active_record_on_batches with a batch size of e.g. 100?

That's not necessary, active_record_on_records does this under the hood.

@bdewater
Copy link
Contributor Author

bdewater commented Nov 1, 2021

In Shop we have a few Shop Users with hundreds of thousands of orders, these can't all be loaded in a single query if they ever were to delete their account. It'd be ideal then to just pass the association_class + owner_model_name + owner_id, and load all data in this job.

Fair point, worth opening an issue or PR on the Rails repo?

That's not necessary, active_record_on_records does this under the hood.

Sorry, I meant deleting in batches using destroy_all instead of calling destroy many times.

@tgwizard
Copy link

tgwizard commented Nov 1, 2021

@bdewater there's minimal benefit on using destroy_all, it anyway needs to operate on each record individually (to trigger callbacks and such): https://github.com/rails/rails/blob/cf99be46c9aec7fe4576da7fc667c4d3994470d2/activerecord/lib/active_record/relation.rb#L585

@bdewater
Copy link
Contributor Author

rails/rails#44617 addresses the pain point raised by @tgwizard and made me realize I still had this PR open collecting dust 😅 will circle back to this soon!

@tgwizard
Copy link

@bdewater That PR is nice, though it doesn't solve the entire issue. If a Shop is deleted and the Shop has 1 million products, then

ids.each_slice(owner.class.destroy_association_async_batch_size || ids.size) do |ids_batch|
              enqueue_destroy_association(

will still not work. There's no way to query for 1 million records in a timely manner within a single unit of work. We'd have to enqueue the job "delete all products for Shop X" instead of "delete all products in batch (1, 2, 3)" + "delete all products in batch (4, 5, 6)".

@sambostock
Copy link
Contributor

Feels like the cascading deletion problem makes #63 relevant.

Active Record 6.1 introduced the 'dependent: :destroy_async' option for associations with a configurable job to asynchronously destroy related models. This PR ports this job to use the Iteration API, making it interruptible and resumable.
@bdewater bdewater force-pushed the destroy_association_job branch from 3af3112 to ad1e4c2 Compare July 21, 2022 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Active Record destroy async integration
4 participants