Skip to content

szcompressor/CliZ_v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CliZ is an adaptive compressor designed for tensor-structured meteorological data. It includes three executables: cliz_compress, cliz_decompress, and cliz_validate.

Compressing and decompressing a single data file:
1. Use cliz_compress to compress the data. Depending on the parameters, it may generate a .cfg file and a .map file.
2. Use cliz_decompress to decompress the data, using the previously generated .cfg and .map files.
3. (Optional) If the original data is available, use cliz_validate to verify the quality of the compression.

Compressing and decompressing multiple similar data files:
1. Prepare a .cfg file and a .map file using cliz_compress on one of the data files.
2. Use the same .cfg and .map files to compress the remaining data files. These will not generate new .cfg or .map files.
3. Use cliz_decompress to decompress all files using the same .cfg and .map files.
4. (Optional) If the original data is available, use cliz_validate to verify compression quality.

CliZ is mainly written in C. We also provide a starter.py script to simplify the build and execution process. The starter.py script can:
1. Run several code_generation.py scripts to generate C source code (this step is skipped if those scripts haven't changed),
2. Invokes "make -j16" to compile the code,
3. Runs the compression and decompression routines.

Please refer to the comments in starter.py for details on how to use it.

To invoke the cliz_compress and cliz_decompress from the command line, you can add the following command:
1. -in {input_file_path}
   When compressing, this is the path to the original data file.
   When decompressing, this is the path to the compressed file.
2. -out {output_file_path}
   When compressing, this is the path to the compressed file.
   When decompressing, this is the path to the decompressed file.
3. -set-cfg {config_file_path} or -use-cfg {config_file_path} (optional)
   Configuration file contains the information about how data is interpreted and what compression algorithm is used.
   Besides, some other information, including dimensions, data type and error bound will also be stored.
   When decompressing, only -use-cfg is allowed.
   If a configuration file is not given, CliZ won't try any optional optimizations.
4. -set-map {map_file_path} or -use-map {map_file_path} (optional)
   Map file is a table indicating quantization bin classification used in Huffman encoding.
   If given "-set-map", the program will test and decide whether to generate a map file.
   If given "-use-map", the program will use the given map file in case the compression algorithm specified in the configuration file requires it.
   When decompressing, only -use-map is allowed.
   If a map file is not given, CliZ won't try this optimization.
5. -mask {mask_file_path} (optional)
   Mask file indicates the validness of data on horizontal positions.
   If given, the program will use the mask file to determine which data is valid and which is not.
   If a mask file is not given, CliZ won't use this optimization.
6. -dim{dimension_num} {dimension_name}{dimension_length} {dimension_name}{dimension_length} ...
   This indicates the dimensions of the data.
   {dimension_name} indicates the physical meaning of this dimension.
   Current supported dimension names:
   - t (time)
   - h (height)
   - lat (latitude)
   - lng (longitude)
   - /*leave out dimension name*/ (no meaning)
   E.g. if given "-dim 12 t120 lat180 lng360", data[i0][i1][i2][i3]=data[i0*120*180*360+i1*180*360+i2*360+i3].
   The program can get the information about dimensions from the command line or the configuration file. At least one of them should be available.
   If both are available, make sure that they match each other.
7. -type {typename} (optional)
   Here inputs the data type.
   Current supported {typename}:
   - f32
   - /*leave out data type*/ (f32)
   The program can get the information about data type from the command line or the configuration file.
   If both of them are unavailable, the data type will be set as f32.
   If both are available, make sure that they match each other.
8. -err {error_type} {error_bound}
   This indicates the error bound for the compression.
   Current supported error type:
   - ABS (absolute error)
   - REL (relative error)
   - /*leave out error type*/ (ABS)
   {error_bound} should be a value larger than 0.
   The program can get the information about error bound from the command line or the configuration file.
   If both source of information are available, make sure that they match each other.

To invoke the cliz_validate from the command line, you can use the following command:
1. -src {source_file_path}
   This is the path to the original data file.
2. -dec {decompressed_file_path}
   This is the path to the decompressed file.
3. -dim{dimension_num} {dimension_name}{dimension_length} {dimension_name}{dimension_length} ...
   This indicates the dimensions of the data.
   {dimension_name} indicates the physical meaning of this dimension.
   Current supported dimension names:
   - t (time)
   - h (height)
   - lat (latitude)
   - lng (longitude)
   - /*leave out dimension name*/ (no meaning)
   E.g. if given "-dim 12 t120 lat180 lng360", data[i0][i1][i2][i3]=data[i0*120*180*360+i1*180*360+i2*360+i3].
4. -type {typename} (optional)
   Here inputs the data type.
   Current supported {typename}:
   - f32
   - /*leave out data type*/ (f32)
5. -err {error_type} {error_bound}
   This indicates the error bound for the validation.
   Current supported error type:
   - ABS (absolute error)
   - REL (relative error)
   - /*leave out error type*/ (ABS)
   {error_bound} should be a value larger than 0.

-------------------------------

Citations:
Z. Jian et al., "CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction," 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA, USA, 2024, pp. 417-429, doi: 10.1109/IPDPS57955.2024.00044. keywords: {Distributed processing;Costs;Distributed databases;Predictive models;Data transfer;Encoding;Compressors;error-controlled lossy compression;climate datasets;distributed data repository/database},

@INPROCEEDINGS{10579165,
  author={Jian, Zizhe and Di, Sheng and Liu, Jinyang and Zhao, Kai and Liang, Xin and Xu, Haiying and Underwood, Robert and Wu, Shixun and Huang, Jiajun and Chen, Zizhong and Cappello, Franck},
  booktitle={2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, 
  title={CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction}, 
  year={2024},
  volume={},
  number={},
  pages={417-429},
  keywords={Distributed processing;Costs;Distributed databases;Predictive models;Data transfer;Encoding;Compressors;error-controlled lossy compression;climate datasets;distributed data repository/database},
  doi={10.1109/IPDPS57955.2024.00044}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published