This is a implementation of the paper: Data-Agnostic Cardinality Learning from Imperfect Workloads.
This repo contains:
- 🪐 A simplified PyTorch implementation of GRASP, containing core functionalities of the GRASP system.
- ⚡️ A PyTorch implementation of ArCDF, improving on prior work NeuroCDF.
- 🛸 A self-contained Python file for reproducing the main experiments on CEB-IMDb-full.
- 🛸 A self-contained Python file for reproducing the main experiments on DSB.
- 🎉A Python script for running the query end-to-end experiments.
- Download CEB-IMDb-full (i.e., CEB-IMDb-13k) benchmark, and place the entire directory in your
IMDB_DIRECTORY
intrain_grasp_ceb.py
. - The DSB workload is contained in this file.
- Please download and install the modified PostgreSQL from here.
- Download the IMDb dataset from here, and download the populated DSB dataset used in the paper from here.
- Please load the data into PostgreSQL.
To train the GRASP model over CEB-IMDb-full, run the following command:
python train_grasp_ceb.py
To train the GRASP model over DSB, run the following command:
python train_grasp_dsb.py
The training scripts can be configured by modifying the parameters in the respective train_grasp_*.py
files. Key parameters include:
epoch
: Number of training epochsfeature_dim
: Dimension of CardEst modelslcs_dim
: dimension of the Learned Count Sketch Modelsbs
: Batch sizelr
: Learning rate
The project includes various utility functions and classes located in the CEB_utlities
and dsb_utlities
directories. These utilities are used for data/workloads processing.
This project is licensed under the MIT License. See the LICENSE file for details.
If you have any questions, feel free to contact me through email ([email protected]).