-
Notifications
You must be signed in to change notification settings - Fork 169
Open
Labels
help wantedExtra attention is neededExtra attention is needed
Description
🚀 The feature
Currently, IterToMap starts to load all data from prior IterDataPipe when the first __getitem__ is invoked here.
https://github.com/pytorch/data/blob/13b574c80e8732744fee6ab9cb7e35b5afc34a3c/torchdata/datapipes/iter/util/converter.py#L78
We can stop loading data from prior IterDataPipe whenever we find the requested index. And, we might need to add a flag to prevent loading data multiple times.
Motivation, pitch
This would improve the performance if users simply iterate over the MapDataPipe as we don't need to pre-load everything at the beginning of the iteration, basically, simulating the behavior of IterDataPipe.
Alternatives
No response
Additional context
No response
NivekT, ArXen42, pmeier and linminhtoo
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed