Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for context parallelism #1299

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bclyang
Copy link
Contributor

@bclyang bclyang commented Oct 1, 2024

Adds context parallelism with ring attention.

See WandB report for training runs to test correctness with a simple 410M config: https://wandb.ai/brandony/neox/reports/Test-context-parallelism-correctness--Vmlldzo5NTU4ODc1.

  • Checked the following settings: no MP/CP, MP 4, CP 4, MP 2 and CP 2
  • Confirmed that the loss exactly matches
  • Memory and training speed seems reasonable

Based on this initial PR: #1266, with changes to get things working:

  • All-reduce gradients across context-parallel nodes by piggy backing on DP
  • Fix parallelism initialization
  • Remove unnecessary code

@bclyang bclyang force-pushed the add-context-parallel-support branch 2 times, most recently from 2b0bd92 to 7f168f0 Compare October 1, 2024 12:28
@CLAassistant
Copy link

CLAassistant commented Oct 1, 2024

CLA assistant check
All committers have signed the CLA.

@bclyang bclyang force-pushed the add-context-parallel-support branch from 7f168f0 to 4969683 Compare October 1, 2024 12:32
@@ -37,7 +37,7 @@ def __init__(
normalized_shape,
eps=1e-5,
no_persist_layer_norm=True,
sequence_parallel=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bclyang -- Shouldn't these remain sequence_parallel to match our previous support for megatron-style sequence parallelism (essentially just TP applied to layernorm and dropout)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants