Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing data load and training #68

Open
mallela opened this issue Feb 7, 2019 · 1 comment
Open

Parallelizing data load and training #68

mallela opened this issue Feb 7, 2019 · 1 comment

Comments

@mallela
Copy link

mallela commented Feb 7, 2019

Hello!

I read in another issue that you load data and perform training in parallel. I was just wondering how exactly you do that? Because the bottle neck does not seem to be training (takes ~0.06s) but data pre-processing/fetching call ( augmentation using imgaug Sequential process ~0.8s; loading .h5 ~0.2s). I am using a batch size of 120.

Are you using multiprocessing or the TF data input pipeline?

Thanks,
Praneeta

@markus-hinsche
Copy link

Praneeta!
In Tensorflow, the method dataset.map() has a parameter num_parallel_calls.

See how we use it in our training implementation of this paper:
https://github.com/merantix/imitation-learning/blob/master/imitation/input_fn.py#L100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants