SymRegg - Equality graph Assisted Search Technique for Equation Discovery (Symbolic Regression)

A Python package for symbolic regression using e-graphs. PySymRegg is built on top of the SymRegg algorithm and provides a scikit-learn compatible API for symbolic regression tasks.

This repository provides a CLI and a Python package for SymRegg with a scikit-learn compatible API for symbolic regression.

Instructions:

CLI version
Python version

CLI

How to use

SymRegg - symbolic regression with e-graphs.

Usage: egraphSearch (-d|--dataset INPUT-FILE) [-t|--test ARG] 
                    [-g|--generations GENS] (-a|--algorithm ALG)
                    (-s|--maxSize ARG) [-k|--folds ARG] 
                    [--trace] [--loss ARG] [--opt-iter ARG] 
                    [--opt-retries ARG] [--non-terminals ARG] [--dump-to ARG] 
                    [--load-from ARG]

  Symbolic Regression search algorithm exploiting the potentials of equality
  saturation and e-graphs.

Available options:
  -d,--dataset INPUT-FILE  CSV dataset.
  -t,--test ARG            test data (default: "")
  -g,--generations GENS    Number of generations. (default: 100)
  -a,--algorithm ALG       Algorithm.
  -s,--maxSize ARG         max-size.
  -k,--folds ARG           k-split ratio training-validation (default: 1)
  --trace                  print all evaluated expressions.
  --loss ARG       distribution of the data. (default: Gaussian)
  --opt-iter ARG           number of iterations in parameter optimization.
                           (default: 30)
  --opt-retries ARG        number of retries of parameter fitting. (default: 1)
  --non-terminals ARG      set of non-terminals to use in the search.
                           (default: "Add,Sub,Mul,Div,PowerAbs,Recip")
  --dump-to ARG            dump final e-graph to a file. (default: "")
  --load-from ARG          load initial e-graph from a file. (default: "")
  -h,--help                Show this help text

The dataset file must contain a header with each features name, and the --dataset and --test arguments can be accompanied by arguments separated by ':' following the format:

filename.ext:start_row:end_row:target:features:ynoise

where each ':' field is optional. The fields are:

start_row:end_row is the range of the training rows (default 0:nrows-1). every other row not included in this range will be used as validation
target is either the name of the (if the datafile has headers) or the index of the target variable
features is a comma separated list of names or indices to be used as input variables of the regression model.
ynoise is either the name or the index of the noise / uncertainty information of the target.

Example of valid names: dataset.csv, mydata.tsv, dataset.csv:20:100, dataset.tsv:20:100:price:m2,rooms,neighborhood, dataset.csv:::5:0,1,2.

The format of the file will be determined by the extension (e.g., csv, tsv,...).

Installation

To install SymRegg you'll need:

libz
libnlopt
libgmp
ghc-9.6.6 or higher
cabal

Method 1: PIP

Simply run:

pip install symregg

under your Python environment.

Method 2: cabal

After installing the dependencies (e.g., apt install libz libnlopt libgmp), install ghcup

For Linux, macOS, FreeBSD or WSL2:

curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | sh

For Windows, run the following in a PowerShell:

Set-ExecutionPolicy Bypass -Scope Process -Force;[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; try { & ([ScriptBlock]::Create((Invoke-WebRequest https://www.haskell.org/ghcup/sh/bootstrap-haskell.ps1 -UseBasicParsing))) -Interactive -DisableCurl } catch { Write-Error $_ }

After the installation, run ghcup tui and install the latest stack or cabal together with ghc-9.6.6 (select the items and press i). To install symregg simply run:

cabal install

Python

Installation

pip install pysymregg

Features

Scikit-learn compatible API with fit() and predict() methods
Support for multiple optimization algorithms
Flexible function set selection
Various loss functions for different problem types
Parameter optimization with multiple restarts
Ability to save and load e-graphs

Usage

Basic Example

from symregg import SymRegg
import numpy as np

# Create sample data
X = np.linspace(-10, 10, 100).reshape(-1, 1)
y = 2 * X.ravel() + 3 * np.sin(X.ravel()) + np.random.normal(0, 1, 100)

# Create and fit the model
model = SymRegg(gen=100, nonterminals="add,sub,mul,div,sin,cos")
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Examine the results
print(model.results)

Integration with scikit-learn

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from symregg import SymRegg

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and fit model
model = SymRegg(gen=150, optIter=100)
model.fit(X_train, y_train)

# Evaluate on test set
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Test MSE: {mse}")

Parameters

Parameter	Type	Default	Description
`gen`	int	100	Number of generations to run
`alg`	str	"BestFirst"	Algorithm type: "BestFirst" or "OnlyRandom"
`maxSize`	int	15	Maximum allowed size for expressions (max 100)
`nonterminals`	str	"add,sub,mul,div"	Comma-separated list of allowed functions
`loss`	str	"MSE"	Loss function: "MSE", "Gaussian", "Bernoulli", or "Poisson"
`optIter`	int	50	Number of iterations for parameter optimization
`optRepeat`	int	2	Number of restarts for parameter optimization
`nParams`	int	-1	Maximum number of parameters (-1 for unlimited)
`folds`	int	1	Data splitting ratio for validation
`trace`	bool	False	Whether to return all visited expressions instead of the Pareto front
`dumpTo`	str	""	Filename to save the final e-graph
`loadFrom`	str	""	Filename to load an e-graph to resume search

Available Functions

The following functions can be used in the nonterminals parameter:

Basic operations: add, sub, mul, div
Powers: power, powerabs, square, cube
Roots: sqrt, sqrtabs, cbrt
Trigonometric: sin, cos, tan, asin, acos, atan
Hyperbolic: sinh, cosh, tanh, asinh, acosh, atanh
Others: abs, log, logabs, exp, recip, aq (analytical quotient)

Methods

fit(X, y, Xerr, yerr): Fits the symbolic regression model with optional uncertainty information for X, y.
fit_mvsr(Xs, ys, Xerrs, yerrs): Fits the symbolic regression model using multi-view SR where each argument is a list of numpy arrays describing multiple samples.
predict(X): Generates predictions using the best model
predict_mvsr(X, view): counter part of predict for multi-view. Must specify the view.
score(X, y): Computes R² score of the best model
evaluate_best_model(X): Evaluates the best model on the given data
evaluate_best_model_mvsr(X, view): Counterpart for multi-view. Must specify the view.
evaluate_model(ix, X): Evaluates the model with index ix on the given data
evaluate_model_mvsr(ix, X, view): Counterpart for multi-view. Must specify the view.

Results

After fitting, the results attribute contains a pandas DataFrame with details about the discovered models, including:

Mathematical expressions
Model complexity
Parameter values
Error metrics
NumPy-compatible expressions

License

LICENSE

Citation

If you use symregg in your research, please cite https://arxiv.org/abs/2501.17848:

@inbook{10.1145/3712256.3726383,
    author = {de Fran\c{c}a, Fabricio Olivetti and Kronberger, Gabriel},
    title = {Improving Genetic Programming for Symbolic Regression with Equality Graphs},
    year = {2025},
    publisher = {Association for Computing Machinery},
    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference},
    pages = {989–998},
    numpages = {10}
}

Acknowledgments

The bindings were created following the amazing example written by wenkokke

Fabricio Olivetti de Franca is supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) grant 301596/2022-0.

Gabriel Kronberger is supported by the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, the Federal Ministry for Labour and Economy, and the regional government of Upper Austria within the COMET project ProMetHeus (904919) supported by the Austrian Research Promotion Agency (FFG).

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
ChangeLog.md		ChangeLog.md
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
cabal.project		cabal.project
pyproject.toml		pyproject.toml
requirements-ci.txt		requirements-ci.txt
setup.py		setup.py
symregg.cabal		symregg.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SymRegg - Equality graph Assisted Search Technique for Equation Discovery (Symbolic Regression)

CLI

How to use

Installation

Method 1: PIP

Method 2: cabal

Python

Installation

Features

Usage

Basic Example

Integration with scikit-learn

Parameters

Available Functions

Methods

Results

License

Citation

Acknowledgments

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

folivetti/symregg

Folders and files

Latest commit

History

Repository files navigation

SymRegg - Equality graph Assisted Search Technique for Equation Discovery (Symbolic Regression)

CLI

How to use

Installation

Method 1: PIP

Method 2: cabal

Python

Installation

Features

Usage

Basic Example

Integration with scikit-learn

Parameters

Available Functions

Methods

Results

License

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages