Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Async and deferred loaders #173

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tgwizard
Copy link
Contributor

@tgwizard tgwizard commented Oct 6, 2024

Very WIP.

This PR introduces two new concepts to loaders:

  1. A loader callback perform_on_wait that triggers whenever any batch loader promise starts waiting. This allows for async work to be triggered early in the lazy processing step (but still after the eager processing step has completed).
  2. A loader method defer, which tells the executor to resolve other non-deferred loaders before the current one, to allow for non-async / non-concurrent loaders to run in a blocking manner while async loaders can wait to block for their result until the last minute.

See the examples/http_loader.rb file for how you'd do this. (I haven't tested that it works. Names not final).

This is inspired by https://graphql-ruby.org/dataloader/parallelism, which is a simpler solution, but based on fibers. Fibers are neat, but comes with other issues (e.g. around thread/fiber local state).

Before this PR, batch loaders could do concurrent operations within themselves. With this PR, batch loaders can do concurrent operations across all batch loaders.

(Actually, the http_loader example isn't great, since it's async/concurrent but not batching. It could start the futures in load, which is in the eager phase of resolution. This PR allows for batch loaders that are both async/concurrent and batching, and also concurrent with each other).

@tgwizard tgwizard force-pushed the properly-async-and-deferred-loaders branch from fe9b4aa to 4027212 Compare October 6, 2024 11:39
@tgwizard tgwizard force-pushed the properly-async-and-deferred-loaders branch from 4027212 to 046a00b Compare October 7, 2024 07:45
@@ -53,6 +53,19 @@ def resolve(loader)
@loading = was_loading
end

def defer(_loader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used; is it needed at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably not needed, I added it to make it similar to resolve.

@@ -126,6 +136,36 @@ def fulfilled?(key)
promise.pending? && promise.source != self
end

def perform_on_wait(keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a loader opts into using perform_on_wait, should defer then always be called in perform? Or is there a situation where defer wouldn't be needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question.

I can't think of any such situation. Given that the perform_on_wait hook is always called first, all those async loaders will have their opportunity to run first, and then they can always defer and run at the very end. Perhaps we could simplify it so that the loader doesn't have to call defer, that's done automatically. Perhaps a new module that a loader can include like AsyncAndDeferred, which adds the on_any_wait hook etc.

Not entirely sure that's better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that better, especially for the reason that it avoids a potential footgun of not calling defer manually. @amomchilov also mentioned expressing this as a separate class, so maybe the module is the solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit with a GraphQL::Batch::Async module. What do you think @amomchilov @swalkinshaw ?

Copy link
Contributor

@amomchilov amomchilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General idea

I really like this feature idea, and it's definitely addressing a real need that've we've repeatedly encountered.

Having the parallelism limited to "within" the batch loader has been a real issue, that unnecessarily holds back the execution of the promises for already-done work, until all work is done.

Documentation considerations

These are really tricky things, that I believe would be difficult for most devs to approach without significant prior experience in this area.

E.g. this is a really nuanced distinction, that I only understand because I've been in these particular weeds before :D

Actually, the http_loader example isn't great, since it's async/concurrent but not batching. It could start the futures in load, which is in the eager phase of resolution. This PR allows for batch loaders that are both async/concurrent and batching, and also concurrent with each other

I think adding more method-level docs on the relevant pieces would help, but also a more complete "guide" style docs, like on the README

Implementation considerations

I wonder if these different behaviours could be better expressed if they were two separate types? Naming is hard, but perhaps something like StreamableLoader vs BatchLoader?


Would love to hear your thoughts! Thanks for working on this

# At this point, all of the concurrent work has been started.

# This converges back in, waiting on each concurrent future to finish, and fulfilling each
# (non-concurrent) Promise.rb promise.
operations.each_with_index.each do |operation, index|
fulfill(operation, futures[index].value) # .value is a blocking call
fulfill(operation, futures[index].value!) # .value is a blocking call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fulfill(operation, futures[index].value!) # .value is a blocking call
fulfill(operation, futures[index].value!) # .value! is a blocking call

end

def perform(operations)
def perform_on_wait(operations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps be:

Suggested change
def perform_on_wait(operations)
def perform_on_any_wait(operations)

To match the naming of Loader#on_any_wait

@@ -126,6 +136,36 @@ def fulfilled?(key)
promise.pending? && promise.source != self
end

def perform_on_wait(keys)
# FIXME: Better name?
# Interface to add custom code to e.g. trigger async operations when any loader starts waiting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move these comments into the method docs above the def? That way they show on hover, in generated docs, etc.

@tgwizard tgwizard force-pushed the properly-async-and-deferred-loaders branch from f5ee2ab to 864d9af Compare October 14, 2024 11:38
@tgwizard tgwizard force-pushed the properly-async-and-deferred-loaders branch from 864d9af to 50227fe Compare October 14, 2024 11:39
@@ -42,13 +42,14 @@ def current_executor
end
end

attr_accessor :loader_key, :executor
attr_accessor :loader_key, :executor, :deferred
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned with deferred terminology conflating with GraphQL defer, to which this isn't directly related. Can this remain consistent with the "async" terminology, or be expressed on some kind of lazy priority scale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I'd like to use yield and yielded, which is what https://graphql-ruby.org/dataloader/parallelism uses, but that's because it uses fibers. We're conceptually "yielding control" here too.

WDYT @gmac @swalkinshaw @amomchilov ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an internal implementation detail now, why not just call it async? And it's more consistent with the Async module naming.

We could even get rid of this entirely and just filter out loaders that don't include GraphQL::Batch::Async? (though maybe that's worse for perf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, though I think that's a worse API. I think yield is a nice fundamental API, that can be used beyond Async. Async builds on top of that (and the "perform on any wait thing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a commit up showing what it looks like?

@tgwizard
Copy link
Contributor Author

I won't have time to test this out before my last day at Shopify (tomorrow). Leaving it hanging here for someone to pick up or close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants