dynamic padding via `collate_fn`

I would like to dynamically pad my tensors by way of the `collate_fn` argument that can be passed to `petastorm.pytorch.DataLoader`, but I am seemingly thwarted by `make_batch_reader` [here](https://github.com/uber/petastorm/blob/master/petastorm/arrow_reader_worker.py#L77), thus it appears `make_batch_reader` prevents the user from shoring up tensor size through the dataloader.

Or is this possible and I'm just missing how to do so? `collate_fn` can take care of the variable length values on a batch by batch basis. Otherwise it seems like I'd need to pad all the data in my spark data frame which increases data size substantially, slows training and I assume i/o through petastorm in general. 

What I would like to do looks something like below where the function passed to `collate_fun` would dynamically pad my variable length values.


```
reader = make_batch_reader(
        channel,
        workers_count=2,
        num_epochs=1,
        schema_fields=['input', 'labels']
    )

dl = DataLoader(reader,
                batch_size = 8,
                shuffling_queue_capacity = 100000,
                collate_fn=some_padding_function
               )
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dynamic padding via `collate_fn` #761

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dynamic padding via collate_fn #761

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

dynamic padding via `collate_fn` #761