-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Describe the issue:
Dataloader accumulates GPU memory across batches if not manually calling gc.collect() after each batch or after every e.g every 5th batch. See example below, manually calling garbage collection saves around 7GiB in max GPU memory usage (11GiB vs 18GiB). Is there a way to free up GPU memory more reliable after each batch?
Minimal Complete Verifiable Example:
Create example data:
import pandas as pd
import numpy as np
n_samples = 20480
df = pd.DataFrame({
'x': [np.random.uniform(size=(19357, )).astype('f4') for _ in range(n_samples)],
'y': np.random.choice(range(100), size=n_samples).astype('i8')
})
df.to_parquet('test.parquet', row_group_size=1024, engine='pyarrow')Check memory usage:
import merlin.io
from merlin.dataloader.torch import Loader
from merlin.schema import ColumnSchema, Schema
import gc
from pynvml import nvmlDeviceGetMemoryInfo, nvmlDeviceGetHandleByIndex
dataset = merlin.io.Dataset(
'test.parquet',
engine='parquet',
part_size='180MB',
schema=Schema([
ColumnSchema(
'x', dtype='float32',
is_list=True, is_ragged=False,
properties={'value_count': {'max': 19357}}
),
ColumnSchema('y', dtype='int64')
])
)
print(dataset.partition_lens[:10]) # --> [2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048]
def benchmark(dataset, batch_size=4096, n_samples=1_000_000, call_gc=False):
handle = nvmlDeviceGetHandleByIndex(0)
max_memory = nvmlDeviceGetMemoryInfo(handle).used
num_iter = n_samples // batch_size
loader = Loader(dataset, batch_size=batch_size, shuffle=True, drop_last=True).epochs(100)
for i, (batch, _) in enumerate(loader):
x, y = batch['x'], batch['y']
max_memory = max((max_memory, nvmlDeviceGetMemoryInfo(handle).used))
if call_gc:
gc.collect()
if i == num_iter:
break
loader.stop()
gc.collect()
return max_memoryWithout manually calling garbage collection
max_mem = benchmark(dataset, batch_size=4096, n_samples=300_000, call_gc=False)
print('Max GPU memory usage:', max_mem // 1024**2 , 'MiB') # --> Gives: Max GPU memory usage: 18435 MiBWith manually calling garbage collection
max_mem = benchmark(dataset, batch_size=4096, n_samples=300_000, call_gc=True)
print('Max GPU memory usage:', max_mem // 1024**2 , 'MiB') # --> Gives: Max GPU memory usage: 11305 MiBEnvironment:
OS: Rocky Linux 8.7
Python: 3.10.9
merlin-core: 0.10.0
merlin-dataloader: 0.0.4
cudf-cu11: 23.02
rmm-cu11: 23.02
dask-cudf: 23.02
I installed both cudf + merlin via pip:
python -m pip install cudf-cu11==23.02 rmm-cu11==23.02 dask-cudf-cu11==23.02 --extra-index-url https://pypi.nvidia.com/
python -m pip install merlin-dataloader