Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronously load next records in ActiveRecordCursor #344

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

odlp
Copy link
Contributor

@odlp odlp commented Feb 21, 2023

It occurred to me the other day that Job Iteration could benefit from Rails 7's async queries. Whilst the job is doing HardWork™️ in #each_iteration, we can begin fetching the next batch of records on another thread. Although most query times are probably < 100ms, this does add up for a job performing thousands or millions of iterations.

Example

I've given this branch a spin locally with a slow job & query:

class User < ApplicationRecord
  scope :slow, -> { select("*, SLEEP(0.5)") } # Sleeps 0.5 per row
end

class SlowJob < ApplicationJob
  include JobIteration::Iteration

  def build_enumerator(cursor:)
    enumerator_builder.active_record_on_batches(
      User.slow,
      batch_size: 2,
      cursor: cursor
    )
  end

  def each_iteration(batch)
    puts Time.now
    sleep(1)
  end
end

With async loading (note 1 second between printed times):

User Load (1006.2ms)  SELECT *, SLEEP(0.5) FROM `users` ORDER BY users.id LIMIT 2
2023-02-21 15:42:31 +0000
ASYNC User Load (24.5ms) (db time 1005.5ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '2') ORDER BY users.id LIMIT 2
2023-02-21 15:42:32 +0000
ASYNC User Load (10.9ms) (db time 1006.3ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '4') ORDER BY users.id LIMIT 2
2023-02-21 15:42:33 +0000
ASYNC User Load (12.0ms) (db time 1007.5ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '6') ORDER BY users.id LIMIT 2
2023-02-21 15:42:34 +0000
ASYNC User Load (11.9ms) (db time 1007.1ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '8') ORDER BY users.id LIMIT 2
...

Without async loading (approx. 2 seconds between printed times):

User Load (1006.2ms)  SELECT *, SLEEP(0.5) FROM `users` ORDER BY users.id LIMIT 2
2023-02-21 15:44:11 +0000
User Load (1006.3ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '2') ORDER BY users.id LIMIT 2
2023-02-21 15:44:13 +0000
User Load (1007.0ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '4') ORDER BY users.id LIMIT 2
2023-02-21 15:44:15 +0000
User Load (1005.4ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '6') ORDER BY users.id LIMIT 2
2023-02-21 15:44:17 +0000
User Load (1008.3ms)  SELECT *, SLEEP(0.5) FROM `users` WHERE (users.id > '8') ORDER BY users.id LIMIT 2
...

In this contrived example the total time to iterate over 112 records was ~58s with async, ~113s without async.

Whilst the job is doing HardWork™️ , we can begin fetching
the next batch of records on another thread.
cursor = ActiveRecordCursor.new(Product.all)
cursor.next_batch(2)

assert_predicate(cursor.next_relation, :scheduled?)
Copy link
Contributor Author

@odlp odlp Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the fence about whether cursor should have an attribute reader for next_relation - it makes testing easier, but perhaps this should be kept private as internal state.

We could also verify the async loading behaviour with mocha:

ActiveRecord::Relation.any_instance.expects(:load_async)

But this felt awkward too in it's own way...

@odlp odlp marked this pull request as ready for review February 21, 2023 16:25
Copy link
Contributor

@Mangara Mangara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really neat idea! I'm not too familiar with the async query mechanism, though, so I'll leave a full review to @etiennebarrie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants