Address Leiden numbering issue #4845

jnke2016 · 2025-01-03T01:47:40Z

Our current implementation of Leiden can return non contiguous cluster IDs however, there is an unused utility function relabel_cluster_ids that serves the purpose of relabeling.

This PR

Addresses the Leiden numbering issue from 4791 by calling relabel_cluster_ids after flattening the dendrogram.
Fixes a bug in the MG python API of Leiden which requires a different seed for each GPU in the C++ API
Add a python test capturing the numbering issue (TODO)

closes #4791

naimnv · 2025-01-03T16:07:01Z

cpp/src/community/leiden_impl.cuh

@@ -713,6 +713,57 @@ std::pair<size_t, weight_t> leiden(

  detail::flatten_leiden_dendrogram(handle, graph_view, *dendrogram, clustering);

+  // Get unique cluster id
+  size_t local_num_verts = (*dendrogram).get_level_size_nocheck(0);


dendrogram->get_level_size_nocheck(0);

naimnv · 2025-01-06T17:16:16Z

cpp/src/community/leiden_impl.cuh

+  rmm::device_uvector<vertex_t> unique_cluster_ids(local_num_verts, handle.get_stream());
+
+  thrust::copy(handle.get_thrust_policy(),
+               clustering,
+               clustering + local_num_verts,
+               unique_cluster_ids.begin());
+
+  thrust::sort(handle.get_thrust_policy(), unique_cluster_ids.begin(), unique_cluster_ids.end());
+
+  unique_cluster_ids.resize(thrust::distance(unique_cluster_ids.begin(),
+                                             thrust::unique(handle.get_thrust_policy(),
+                                                            unique_cluster_ids.begin(),
+                                                            unique_cluster_ids.end())),
+                            handle.get_stream());
+
+  if constexpr (multi_gpu) {
+    auto recvcounts = cugraph::host_scalar_allgather(
+      handle.get_comms(), unique_cluster_ids.size(), handle.get_stream());
+
+    std::vector<size_t> displacements(recvcounts.size());
+    std::exclusive_scan(recvcounts.begin(), recvcounts.end(), displacements.begin(), size_t{0});
+    rmm::device_uvector<vertex_t> allgathered_unique_cluster_ids(
+      displacements.back() + recvcounts.back(), handle.get_stream());
+    cugraph::device_allgatherv(handle.get_comms(),
+                               unique_cluster_ids.begin(),
+                               allgathered_unique_cluster_ids.begin(),
+                               recvcounts,
+                               displacements,
+                               handle.get_stream());
+
+    thrust::sort(handle.get_thrust_policy(),
+                 allgathered_unique_cluster_ids.begin(),
+                 allgathered_unique_cluster_ids.end());
+
+    allgathered_unique_cluster_ids.resize(
+      thrust::distance(allgathered_unique_cluster_ids.begin(),
+                       thrust::unique(handle.get_thrust_policy(),
+                                      allgathered_unique_cluster_ids.begin(),
+                                      allgathered_unique_cluster_ids.end())),
+      handle.get_stream());
+
+    detail::relabel_cluster_ids<vertex_t, multi_gpu>(
+      handle, allgathered_unique_cluster_ids, clustering, local_num_verts);
+
+  } else {
+    detail::relabel_cluster_ids<vertex_t, multi_gpu>(
+      handle, unique_cluster_ids, clustering, local_num_verts);
+  }
+


LGTM, but please test it rigorously.
Currently we don't have a test to check if the produced cluster ids are consecutive, starting from 0.

You need to comment out the lines
https://github.com/rapidsai/cugraph/blob/2b1ac70272c2f9a08d688496c4ae88104acb2c62/cpp/src/community/leiden_impl.cuh#L606C1-L627C4
as it's used differently now.

ChuckHastings · 2025-01-06T19:17:14Z

cpp/src/community/leiden_impl.cuh

+    std::exclusive_scan(recvcounts.begin(), recvcounts.end(), displacements.begin(), size_t{0});
+    rmm::device_uvector<vertex_t> allgathered_unique_cluster_ids(
+      displacements.back() + recvcounts.back(), handle.get_stream());
+    cugraph::device_allgatherv(handle.get_comms(),


By doing an allgatherv we are assuming that the entire list of cluster ids will fit in the available GPU memory of all GPUs. It's not clear to me... if we have a large graph on thousands of GPUs that doesn't cluster well that this is a safe assumption.

It's probably safer (for scalability purposes) to shuffle things to different GPUs and each generate their own unique subset. So I'd suggest:

Use the remove_duplicates defined earlier in this file which already does the sort/unique on a list and handles SG or MG

I think this result can be passed into relabel_cluster_ids directly which would greatly simplify this code.

ChuckHastings · 2025-01-08T05:31:50Z

cpp/src/community/leiden_impl.cuh

+                        cluster_ids_size_per_rank.end(),
+                        cluster_ids_starts.begin(),
+                        size_t{0});
+    auto& comm             = handle.get_comms();


Looks good, functionally.

If you're going to pull out &comm from the handle, I'd do that at line 612 (right as you enter the if block and use comm also in the call to host_scalar_allgather. Otherwise I would just use handle.get_comms().get_rank() as the index in the next line.

But that's purely cosmetic.

…2_fix_leiden_numbering

jnke2016 added 4 commits January 2, 2025 16:16

relabel clusters ids

4d003a0

leiden expects unique seed per GPU

3e06c06

properly relabel cluster ids in multi_gpu case

0675c73

fix style

2b1ac70

jnke2016 requested review from a team as code owners January 3, 2025 01:47

github-actions bot added cuGraph python labels Jan 3, 2025

jnke2016 requested review from rlratzel, ChuckHastings, seunghwak and naimnv January 3, 2025 01:48

naimnv reviewed Jan 3, 2025

View reviewed changes

naimnv reviewed Jan 6, 2025

View reviewed changes

ChuckHastings reviewed Jan 6, 2025

View reviewed changes

jnke2016 added 2 commits January 7, 2025 16:57

shuffle cluster IDs

fdf09de

fix style

ce22ea5

ChuckHastings reviewed Jan 8, 2025

View reviewed changes

jnke2016 added 8 commits January 8, 2025 11:48

simplify code

649ea7f

add SG tests for Leiden's cluster IDs

cfdba93

add MG tests for Leiden's cluster IDs

9e24488

fix style

2f3a737

include utility functions

9489ea0

update mg tests

1ffe352

fix style

c70a229

Merge remote-tracking branch 'upstream/branch-25.02' into branch-25.0…

7e4651f

…2_fix_leiden_numbering

jnke2016 self-assigned this Jan 9, 2025

jnke2016 added this to the 25.02 milestone Jan 9, 2025

jnke2016 added 2 commits January 9, 2025 15:20

removed unused code

66bd69d

fix copyright

485ab35

jnke2016 added 2 commits January 9, 2025 16:12

fix copyright

d1d3687

update copyright

2f00a89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address Leiden numbering issue #4845

Address Leiden numbering issue #4845

jnke2016 commented Jan 3, 2025 •

edited

Loading

naimnv Jan 3, 2025

naimnv Jan 6, 2025 •

edited

Loading

naimnv Jan 6, 2025

ChuckHastings Jan 6, 2025

ChuckHastings Jan 8, 2025

Address Leiden numbering issue #4845

Are you sure you want to change the base?

Address Leiden numbering issue #4845

Conversation

jnke2016 commented Jan 3, 2025 • edited Loading

naimnv Jan 3, 2025

Choose a reason for hiding this comment

naimnv Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

naimnv Jan 6, 2025

Choose a reason for hiding this comment

ChuckHastings Jan 6, 2025

Choose a reason for hiding this comment

ChuckHastings Jan 8, 2025

Choose a reason for hiding this comment

jnke2016 commented Jan 3, 2025 •

edited

Loading

naimnv Jan 6, 2025 •

edited

Loading