Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init Implementation per_stream batching #438

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

schopra8
Copy link

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #434

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@schopra8 schopra8 requested a review from tchaton as a code owner December 22, 2024 01:07
@schopra8
Copy link
Author

@tchaton When is the CacheDataLoader used vs StreamingDataLoader in conjunction with a CombinedStreamingDataset?

I'm trying to send messages to iterators and process them within their worker loops -- when the iterator must change it's dataset index (i.e., on batch boundaries). I'm getting confused by the difference between _MultiProcessingDataLoaderIterPatch and _StreamingMultiProcessingDataLoaderIter. Seems like the former is used in CacheDataLoader while the latter is used in StreamingDataLoader.

@Borda Borda changed the title Init Implementation per_stream batching Init Implementation per_stream batching Jan 9, 2025
for batch in super().__iter__():
print("Fetched a batch ...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use loggier instead of print

Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 55.26316% with 17 lines in your changes missing coverage. Please review.

Project coverage is 78%. Comparing base (bd116a4) to head (32abd77).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #438   +/-   ##
===================================
- Coverage    78%    78%   -0%     
===================================
  Files        35     35           
  Lines      5106   5141   +35     
===================================
+ Hits       4006   4024   +18     
- Misses     1100   1117   +17     

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Advanced Batching Logic with CombinedStreamingDataset
2 participants