Skip to content

eikeno/balanced_storage_python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

A tiny python module intended to allow balanced storage of files in a persistent cache fashion.

The goal is to avoid ending up having too many files in a single directory, which can cause certain performance problems; at least under certain conditions or with specific tools.

Depth od additional directories is configurable at instantiation time to accommodate different needs.

There could also be other use cases, such as splitting top directories across different storage (local or remote mounts), even if far better solutions exist in that case - maybe OK for some quick and dirty prototyping situations.

Limitations

This is minimalist to be as generic as possible, and easy to be tweaked into something more complex as needed.

Meaning:

  • No indexing is used, only MD5 sums of files names are used:
    • Filenames must be unique within a whole store.
    • Later modification or corruption of the file content will NOT trigger any error or warning. Control sum is performed on file name, not content.
  • No safety belt included. To be implemented if required, by the user.

Was tested on:

  • Linux
  • FreeBSD

Installation

pip install balanced_storage

Arguments

  • topdir: String, full path to the top level directory of the store.
  • depth: Integer, 0 to 32: levels of added directories used in balancing.

The requested depth will use this number of first characters of the MD5 sum of the provided file name (See: Limitations above).

Example

store = BalancedStorage(topdir='/home/me/files', depth=2)

new = store.insert_file('/some/where/foo.zip')
print(new)
>>> '/home/me/files/c/5/foo.zip'
  • MD5 sum of foo.zip is c5f98effec1104de04a8b293d90c8220
  • File is stored as: /home/me/files/c/5/foo.zip
  • Original file /some/where/foo.zip is left untouched. It's only copied, not moved.

Obtain the path of a stored file by name

bs.get_path('foo.zip')
print(bs)
>>> '/home/me/files/0/7/foo.zip'

To create an empty file directly in the store, than can be written to later on:

f = store.create_file('bar.txt')
with open(f, "w"):
	f.write("Hello World")

Performance warning

Inodes consumption for the extra directories will greatly increase with depth number of dirs at full capacity with depth 32.

Choosing depth should be based on FS inodes availability as well as expected volume of files to manage: I.e. no need to use depth=10 to manage a few hundred files.

On the other hand, if a too small value is used with a huge volume of files, some directories could end up holding too much files, degrading access time in some circumstances.

About

A tiny python module intended to allow balanced storage of files in a persistent cache fashion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published