-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address Leiden numbering issue #4845
base: branch-25.02
Are you sure you want to change the base?
Address Leiden numbering issue #4845
Conversation
@@ -713,6 +713,57 @@ std::pair<size_t, weight_t> leiden( | |||
|
|||
detail::flatten_leiden_dendrogram(handle, graph_view, *dendrogram, clustering); | |||
|
|||
// Get unique cluster id | |||
size_t local_num_verts = (*dendrogram).get_level_size_nocheck(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dendrogram->get_level_size_nocheck(0);
cpp/src/community/leiden_impl.cuh
Outdated
rmm::device_uvector<vertex_t> unique_cluster_ids(local_num_verts, handle.get_stream()); | ||
|
||
thrust::copy(handle.get_thrust_policy(), | ||
clustering, | ||
clustering + local_num_verts, | ||
unique_cluster_ids.begin()); | ||
|
||
thrust::sort(handle.get_thrust_policy(), unique_cluster_ids.begin(), unique_cluster_ids.end()); | ||
|
||
unique_cluster_ids.resize(thrust::distance(unique_cluster_ids.begin(), | ||
thrust::unique(handle.get_thrust_policy(), | ||
unique_cluster_ids.begin(), | ||
unique_cluster_ids.end())), | ||
handle.get_stream()); | ||
|
||
if constexpr (multi_gpu) { | ||
auto recvcounts = cugraph::host_scalar_allgather( | ||
handle.get_comms(), unique_cluster_ids.size(), handle.get_stream()); | ||
|
||
std::vector<size_t> displacements(recvcounts.size()); | ||
std::exclusive_scan(recvcounts.begin(), recvcounts.end(), displacements.begin(), size_t{0}); | ||
rmm::device_uvector<vertex_t> allgathered_unique_cluster_ids( | ||
displacements.back() + recvcounts.back(), handle.get_stream()); | ||
cugraph::device_allgatherv(handle.get_comms(), | ||
unique_cluster_ids.begin(), | ||
allgathered_unique_cluster_ids.begin(), | ||
recvcounts, | ||
displacements, | ||
handle.get_stream()); | ||
|
||
thrust::sort(handle.get_thrust_policy(), | ||
allgathered_unique_cluster_ids.begin(), | ||
allgathered_unique_cluster_ids.end()); | ||
|
||
allgathered_unique_cluster_ids.resize( | ||
thrust::distance(allgathered_unique_cluster_ids.begin(), | ||
thrust::unique(handle.get_thrust_policy(), | ||
allgathered_unique_cluster_ids.begin(), | ||
allgathered_unique_cluster_ids.end())), | ||
handle.get_stream()); | ||
|
||
detail::relabel_cluster_ids<vertex_t, multi_gpu>( | ||
handle, allgathered_unique_cluster_ids, clustering, local_num_verts); | ||
|
||
} else { | ||
detail::relabel_cluster_ids<vertex_t, multi_gpu>( | ||
handle, unique_cluster_ids, clustering, local_num_verts); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please test it rigorously.
Currently we don't have a test to check if the produced cluster ids are consecutive, starting from 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to comment out the lines
https://github.com/rapidsai/cugraph/blob/2b1ac70272c2f9a08d688496c4ae88104acb2c62/cpp/src/community/leiden_impl.cuh#L606C1-L627C4
as it's used differently now.
cpp/src/community/leiden_impl.cuh
Outdated
std::exclusive_scan(recvcounts.begin(), recvcounts.end(), displacements.begin(), size_t{0}); | ||
rmm::device_uvector<vertex_t> allgathered_unique_cluster_ids( | ||
displacements.back() + recvcounts.back(), handle.get_stream()); | ||
cugraph::device_allgatherv(handle.get_comms(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By doing an allgatherv we are assuming that the entire list of cluster ids will fit in the available GPU memory of all GPUs. It's not clear to me... if we have a large graph on thousands of GPUs that doesn't cluster well that this is a safe assumption.
It's probably safer (for scalability purposes) to shuffle things to different GPUs and each generate their own unique subset. So I'd suggest:
- Use the
remove_duplicates
defined earlier in this file which already does the sort/unique on a list and handles SG or MG - I think this result can be passed into
relabel_cluster_ids
directly which would greatly simplify this code.
cpp/src/community/leiden_impl.cuh
Outdated
cluster_ids_size_per_rank.end(), | ||
cluster_ids_starts.begin(), | ||
size_t{0}); | ||
auto& comm = handle.get_comms(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, functionally.
If you're going to pull out &comm
from the handle, I'd do that at line 612 (right as you enter the if block and use comm
also in the call to host_scalar_allgather
. Otherwise I would just use handle.get_comms().get_rank()
as the index in the next line.
But that's purely cosmetic.
…2_fix_leiden_numbering
Our current implementation of Leiden can return non contiguous cluster IDs however, there is an unused utility function relabel_cluster_ids that serves the purpose of relabeling.
This PR
relabel_cluster_ids
after flattening the dendrogram.closes #4791