A tiny python module intended to allow balanced storage of files in a persistent cache fashion.
The goal is to avoid ending up having too many files in a single directory, which can cause certain performance problems; at least under certain conditions or with specific tools.
Depth od additional directories is configurable at instantiation time to accommodate different needs.
There could also be other use cases, such as splitting top directories across different storage (local or remote mounts), even if far better solutions exist in that case - maybe OK for some quick and dirty prototyping situations.
This is minimalist to be as generic as possible, and easy to be tweaked into something more complex as needed.
Meaning:
- No indexing is used, only MD5 sums of files names are used:
- Filenames must be unique within a whole store.
- Later modification or corruption of the file content will NOT trigger any error or warning. Control sum is performed on file name, not content.
- No safety belt included. To be implemented if required, by the user.
Was tested on:
- Linux
- FreeBSD
pip install balanced_storage
- topdir: String, full path to the top level directory of the store.
- depth: Integer, 0 to 32: levels of added directories used in balancing.
The requested depth will use this number of first characters of the MD5 sum of the provided file name (See: Limitations above).
store = BalancedStorage(topdir='/home/me/files', depth=2)
new = store.insert_file('/some/where/foo.zip')
print(new)
>>> '/home/me/files/c/5/foo.zip'
- MD5 sum of foo.zip is c5f98effec1104de04a8b293d90c8220
- File is stored as: /home/me/files/c/5/foo.zip
- Original file /some/where/foo.zip is left untouched. It's only copied, not moved.
bs.get_path('foo.zip')
print(bs)
>>> '/home/me/files/0/7/foo.zip'
To create an empty file directly in the store, than can be written to later on:
f = store.create_file('bar.txt')
with open(f, "w"):
f.write("Hello World")
Inodes consumption for the extra directories will greatly increase with depth number of dirs at full capacity with depth 32.
Choosing depth should be based on FS inodes availability as well as expected volume of files to manage: I.e. no need to use depth=10 to manage a few hundred files.
On the other hand, if a too small value is used with a huge volume of files, some directories could end up holding too much files, degrading access time in some circumstances.