rEGGression (r🥚ression) - Nonlinear regression models exploration and query system with e-graphs (egg).

r🥚ression an interactive tool that can help SR users to explore alternative models generated from different sources. These sources can be: the final population of a single run, the Pareto front, the entire history of visited expressions during the search, or a combination of those sources from multiple runs of the same or different algorithms. This can provide a rich library of alternative expressions generated from different biases, induced by different hyper-parameters or algorithms, that can bring invaluable information about the data. This tool supports simple queries such as querying for the top-N models filtered by size, complexity, and number of numerical parameters; insert and evaluate new expressions; list and evaluate sub-expressions of the already visited expressions, and also, more advanced queries such as calculate the frequency of common patterns (i.e., building blocks) observed in the set of models; and filter the expressions by patterns with a natural syntax.

This repository provides a CLI and a Python package for rEGGression with a scikit-learn compatible API for symbolic regression.

Instructions:

Changelog

v1.0.9

allow empty e-graph in Python wrapper

v1.0.8

fixed slowdown in top with patterns
added modularity command to detect modular equations
added eqsat command to run a simplified version of equality saturation

v1.0.7

fixed distributionOfTokens
added Numpy column in dataframe

v1.0.6

included method importFromCSV to import equations from other SR algorithms
fixed bug that may create fake duplicates in e-graph
added top-n option in distributionOfTokens
fixed bug in pattern matching

v1.0.2

Fused the CLI tool and Python wrapper, you can install both with pip install. The executable name is reggression.
Improved Python interface
Added the command distribution-tokens (method distributionOfTokens in Python) that shows the distributions of tokens and average fitness (requested by @gbomarito)
Added the command extract-pattern (method extractPattern in Python) that shows all the patterns that can be extracted from a single expression (idea from @juliareuter)

CLI

How to use

r🥚ression - Nonlinear regression models exploration and query system with
e-graphs (egg).

Usage: reggression (-d|--dataset INPUT-FILE) [-t|--test ARG] 
                   [--distribution ARG] [--dump-to ARG] [--load-from ARG] 
                   [--parse-csv ARG] [--convert ARG] [--parse-parameters] 
                   [--to ARG] [--calculate-dl]

  Exploration and query system for a database of regression models using
  e-graphs.

Available options:
  -d,--dataset INPUT-FILE  CSV dataset.
  -t,--test ARG            test data (default: "")
  --distribution ARG       distribution of the data. (default: Gaussian)
  --dump-to ARG            dump final e-graph to a file. (default: "")
  --load-from ARG          load initial e-graph from a file. (default: "")
  --parse-csv ARG          parse-csv CSV file with the format
                           expression,parameters,fitness. The fitness value
                           should be maximization and the parameters a ;
                           separated list (there must be an additional parameter
                           for sigma in the Gaussian distribution). The format
                           of the equation is determined by the extension of the
                           file, supported extensions are operon, pysr, tir,
                           itea, hl (heuristiclab), gomea, feat, etc.
                           (default: "")
  --convert ARG            convert FROM TO, converts equation format from a
                           given format (see 'parse-csv') to either 'math' or
                           'numpy'. The 'math' format is compatible with the tir
                           format, so you can use this to standardize the
                           equations from multiple sources into a single file.
                           The output will be written to stdout. (default: "")
  --parse-parameters       Extract the numerical parameters from the expression.
                           In this case the csv file should be formatted as
                           "equation,error,fitness, where 'error' is the error
                           term used in Gaussia likelihood, it can be empty if
                           using other distributions."
  --to ARG                 Format to convert to. (default: MATH)
  --calculate-dl           (re)calculate DL.
  -h,--help                Show this help text

The dataset file must contain a header with each features name, and the --dataset and --test arguments can be accompanied by arguments separated by ':' following the format:

filename.ext:start_row:end_row:target:features

where each ':' field is optional. The fields are:

start_row:end_row is the range of the training rows (default 0:nrows-1). every other row not included in this range will be used as validation
target is either the name of the (if the datafile has headers) or the index of the target variable
features is a comma separated list of names or indices to be used as input variables of the regression model.

Example of valid names: dataset.csv, mydata.tsv, dataset.csv:20:100, dataset.tsv:20:100:price:m2,rooms,neighborhood, dataset.csv:::5:0,1,2.

The format of the file will be determined by the extension (e.g., csv, tsv,...).

Demo

Installation

To install rEGGression you'll need:

libz
libnlopt
libgmp
ghc-9.6.6
cabal or stack

Method 1: PIP

Simply run:

pip install reggression

under your Python environment.

Method 2: cabal

After installing the dependencies (e.g., apt install libz libnlopt libgmp), install ghcup

For Linux, macOS, FreeBSD or WSL2:

curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | sh

For Windows, run the following in a PowerShell:

Set-ExecutionPolicy Bypass -Scope Process -Force;[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; try { & ([ScriptBlock]::Create((Invoke-WebRequest https://www.haskell.org/ghcup/sh/bootstrap-haskell.ps1 -UseBasicParsing))) -Interactive -DisableCurl } catch { Write-Error $_ }

After the installation, run ghcup tui and install the latest stack or cabal together with ghc-9.6.6 (select the items and press i). To install srsimplify simply run:

cabal install

Python

Features

Load and analyze symbolic regression models from e-graph files
Import expressions from various symbolic regression tools (TIR, HeuristicLab, Operon, etc.)
Query expressions by patterns, size, parameters, and complexity
Analyze expression distributions and patterns
Extract Pareto fronts of accuracy vs. expression size
Support for different loss functions for various problem types
Optimization and reporting capabilities

Usage

You can find a Jupyter Notebook with examples here

Basic Usage

from reggression import Reggression

# Load from an existing e-graph file
egg = Reggression(
    dataset="train_data.csv", 
    loadFrom="my_models.egraph"
)

# Get the top 10 expressions by fitness
top_models = egg.top(10)
print(top_models)

# Save the e-graph to a new file
egg.save("updated_models.egraph")

Importing from Other Symbolic Regression Tools

from reggression import Reggression

# Import expressions from a CSV file generated by another tool
egg = Reggression(
    dataset="train_data.csv",
    parseCSV="operon_results.operon",  # File extension indicates the source tool
    parseParams=True
)

# Get the top models
best_models = egg.top(5)
print(best_models)

Pattern Matching and Filtering

from reggression import Reggression

egg = Reggression(dataset="train_data.csv", loadFrom="models.egraph")

# Find expressions with specific characteristics
filtered = egg.top(
    n=10,
    filters=["size < 15", "parameters <= 3"],
    criteria="fitness",
    pattern="v0 * x0",  # Match any expression multiplied by x0
    isRoot=False
)
print(filtered)

# Count occurrences of a pattern
count = egg.countPattern("sin(v0)")
print(f"Number of expressions containing sine: {count}")

Distribution Analysis

from reggression import Reggression

egg = Reggression(dataset="train_data.csv", loadFrom="models.egraph")

# Analyze pattern distribution
dist = egg.distribution(
    filters=["size <= 10"],
    limitedAt=20,
    dsc=True,
    byFitness=True,
    atLeast=100,
    fromTop=5000
)
print(dist)

Testing on New Data

from reggression import Reggression

egg = Reggression(
    dataset="train_data.csv",
    testData="test_data.csv",
    loadFrom="models.egraph"
)

# Get the top models evaluated on test data
test_results = egg.top(10)
print(test_results)

Parameters

Parameter	Type	Default	Description
`dataset`	str	required	Filename of the training dataset in CSV format
`testData`	str	""	Filename of the test dataset in CSV format
`loss`	str	"MSE"	Loss function: "MSE", "Gaussian", "Bernoulli", or "Poisson"
`loadFrom`	str	""	Filename of an e-graph to load
`parseCSV`	str	""	CSV file with expressions from another tool
`parseParams`	bool	True	Whether to extract parameter values from expressions

Supported File Extensions for parseCSV

PyReggression can import expressions from various symbolic regression tools:

.tir - TIR and ITEA
.hl - HeuristicLab
.operon - Operon
.bingo - BINGO
.gomea - GP-GOMEA
.pysr - PySR
.sbp - SBP
.eplex - EPLEX, FEAT, BRUSH

Methods

Query Methods

top(n, filters, criteria, pattern, isRoot, negate): Returns top expressions by criteria
distribution(filters, limitedAt, dsc, byFitness, atLeast, fromTop): Returns pattern distribution
countPattern(pattern): Counts occurrences of a pattern
pareto(byFitness): Returns the Pareto front of accuracy vs. size

Analysis Methods

report(n): Detailed report of e-class n
optimize(n): Re-optimize parameters for e-class n
subtrees(n): Return subtrees of e-class n

Manipulation Methods

insert(expr): Insert a new expression
save(fname): Save the e-graph file
load(fname): Load an e-graph file
runQuery(query, df): Run a custom query against the e-graph

Pattern Syntax

Pattern matching uses the following syntax:

x0, x1, ... - Input variables
t0, t1, ... - Model parameters
v0, v1, ... - Pattern variables (match any expression)

Examples:

t0 * x0 - Match exactly this expression
v0 * x0 - Match any expression multiplied by x0
sin(v0) - Match sine of any expression
v0 + v1 - Match any addition
v0 + x0 * v1^v0 - Match an expression v0 added to x0 multiplied by the expression v1 to the power of v0. E.g., t0*x1 + x0 * t1^(t0*x1)

License

[LICENSE]

Citation

If you use PyReggression in your research, please cite:

@inproceedings{rEGGression,
author = {de Franca, Fabricio Olivetti and Kronberger, Gabriel},
title = {rEGGression: an Interactive and Agnostic Tool for the Exploration of Symbolic Regression Models},
year = {2025},
isbn = {9798400714658},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3712256.3726385},
doi = {10.1145/3712256.3726385},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference},
pages = {},
numpages = {9},
keywords = {Genetic programming, Symbolic regression, Equality saturation, e-graphs},
location = {Malaga, Spain},
series = {GECCO '25},
archivePrefix = {arXiv},
       eprint = {2501.17859},
 primaryClass = {cs.LG},
}

Acknowledgments

The bindings were created following the amazing example written by wenkokke

Fabricio Olivetti de Franca is supported by Conselho Nacional de Desenvolvimento Cient'{i}fico e Tecnol'{o}gico (CNPq) grant 301596/2022-0.

Gabriel Kronberger is supported by the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, the Federal Ministry for Labour and Economy, and the regional government of Upper Austria within the COMET project ProMetHeus (904919) supported by the Austrian Research Promotion Agency (FFG).

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
scripts		scripts
src		src
test		test
tutorial		tutorial
.gitignore		.gitignore
.history		.history
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
cabal.project		cabal.project
pyproject.toml		pyproject.toml
reggression.cabal		reggression.cabal
requirements-ci.txt		requirements-ci.txt
setup.py		setup.py

License

folivetti/reggression

Folders and files

Latest commit

History

Repository files navigation

rEGGression (r🥚ression) - Nonlinear regression models exploration and query system with e-graphs (egg).

Changelog

v1.0.9

v1.0.8

v1.0.7

v1.0.6

v1.0.2

CLI

How to use

Demo

Installation

Method 1: PIP

Method 2: cabal

Python

Features

Usage

Basic Usage

Importing from Other Symbolic Regression Tools

Pattern Matching and Filtering

Distribution Analysis

Testing on New Data

Parameters

Supported File Extensions for parseCSV

Methods

Query Methods

Analysis Methods

Manipulation Methods

Pattern Syntax

License

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages