-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Context and Endpoint classes to enable non-Communicator use-cases #166
Conversation
3b77f2c
to
87a034a
Compare
Tests are broken still
Add EndpointConfig struct to remove duplication of constants and get configuration options out of the function calls.
This deadlock was introduced when updating sendrecv_test.cu for Connection being returned in a future.
87a034a
to
2426e8e
Compare
I rebased to main, which had a conflict related to the closing of the CUDA IPC handle on RegisteredMemory deconstruction. I ended up improving the whole RegisteredMemory serialization such that now the whole thing round-trips through multiple rounds of serialization/deserialization. There is a new I also resurrected one test that was related to this (it had an unrelated bug). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Could you provide a demonstration on how to set up a connection using Context
? This would make the workflow easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
@Binyang2014 for the demonstration of how to use |
Yes, the comment looks good, but sample code would be even easier to follow. The idea of Python testing sounds good to me. |
This PR implements and closes #137. The new
Endpoint
andContext
classes expose the connection establishing functionality fromCommunicator
, which now is only responsible for tying together the bootstrapper with a context.The largest breaking change here is that
Communicator.connectOnSetup(...)
now returns theConnection
wrapped inside aNonblockingFuture
. This is because with the wayContext
is implemented aConnection
is now fully initialized on construction.Some smaller breaking API changes from this change are that
RegisteredMemory
no longer has arank()
function (as there maybe no concept of rank), and similarlyConnection
has noremoteRank()
andtag()
functions. The latter are replaced byremoteRankOf
andtagOf
functions inCommunicator
.A new
EndpointConfig
class is introduced to avoid duplication of the IB configuration parameters in the APIs ofContext
andCommunicator
. The usual usage pattern of just passing in aTransport
still works due to an implicit conversion intoEndpointConfig
.Miscellaneous changes:
-Cleans up how the PIMPL pattern is applied by making both the
Impl
struct and thepimpl_
pointers private for all relevant classes in the core API.-Enables ctest to be run from the build root directory.