H5 Manager is a tiny, battery‑included toolkit that turns folders of images into tidy HDF5 datasets, stitches those datasets together, and lets you flip through them in seconds — all from a single command‑line interface.
While experimenting with computer‑vision side projects I kept bouncing between raw JPEG folders and different HDF5 ad‑hoc scripts. Each time I forgot a flag or lost the code snippet. H5 Manager is my attempt to freeze that knowledge into a reusable, well‑tested package:
- Convert an arbitrary image tree into a single images dataset (uint8 | int16) with one flag.
- Merge any number of compatible .h5 files (optionally shuffled).
- Visualise the first few or page through the whole set with arrow keys.
- Configure defaults in ~/.h5manager/config.yaml or an env var.
- Ships tiny example data & pytest suite so CI and newcomers can try it instantly.
If you’ve ever asked yourself…
“What even is an HDF5 file and why should I care?” “Why does storing 10 k images in folders grind my dataloader?” “Which one‑liner will regenerate this dataset six months from now?”
…then this repo is for you.
[email protected]:ss4328/h5_manager_scripts.git
pip install -e # 1 Convert two image folders (resize to 64×64)
h5manager convert example_img_dir/seefood/test/hot_dog \
--output tmp/hotdog_64 --dim 64
h5manager convert example_img_dir/seefood/test/not_hot_dog \
--output tmp/nothotdog_64 --dim 64
# 2 Merge them into one dataset
h5manager merge --inputs tmp/hotdog_64.h5,tmp/nothotdog_64.h5 \
--output tmp/merged
# 3 Browse interactively (p/n to page)
h5manager visualize tmp/merged.h5Screenshots below show the pager in action — notice the mix of classes once you hit →.

Why not keep raw folders?
- HDF5 offers compression, contiguous storage, random access, and zero filesystem overhead per image. Large CV datasets load 3‑10 × faster.
Does this replace TFRecord / LMDB?
- No – it’s a lightweight alternative when you prefer pure‑Python tooling and the HDF5 ecosystem.
Can I store labels?
- For now the CLI is image‑only; add extra datasets (labels, bboxes) via h5py or open an issue for a feature request.
- Shrink example dataset in examples/images/ (< 1 MB).
- Add Streamlit GUI (h5manager gui).
- Lazy / chunked writer for huge datasets. Pull requests & bug reports are very welcome!
- National Center for Supercomputing Applications — creators of HDF5.
- Tomacz Golan — original mergeh5.py inspiration.
- NEON Science blog — approachable HDF5 primer.
Released under the MIT License. © 2025 Shivansh Suhane