feat: Blase NS multi-machine batch #1690

bollu · 2025-09-22T14:09:22Z

We will use snakemake's kubernetes executor:
https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/kubernetes.html

along with namespace's kubernetes cluster support:

https://namespace.so/docs/reference/github-actions/nscloud-cluster-action

to run our snakemake setup directly on snakemake.
The open question for me is where this will store the build artifact. Currently, our evaluation run uploads a ZIP file to Zulip. We can upload this ZIP file as a build artifact on GH.

We will use `snakemake`'s kubernetes executor: https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/kubernetes.html along with namespace's kubernetes cluster support: - https://namespace.so/docs/reference/github-actions/nscloud-cluster-action to run our snakemake setup directly on snakemake. The open question for me is where this will store the build artifact. Currently, our evaluation run uploads a ZIP file to Zulip. We can upload this ZIP file as a build artifact on GH.

github-actions · 2025-09-22T14:32:51Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T07:54:56Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T07:55:59Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T07:59:27Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T08:42:05Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T08:43:49Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T08:57:35Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T09:11:50Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T09:19:28Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-24T09:24:30Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

…running the action I care about

github-actions · 2025-09-26T09:02:04Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-26T09:03:30Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-26T09:04:58Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-09-26T09:06:50Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

chore: move lake cache to before

bollu · 2025-10-07T22:39:54Z

!blase

bollu · 2025-10-07T22:45:38Z

!blase

github-actions · 2025-10-07T22:57:13Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

github-actions · 2025-10-07T23:05:25Z

bv_decide solved 0 theorems.
bitwuzla solved 0 theorems.
bv_decide found 0 counterexamples.
bitwuzla found 0 counterexamples.
bv_decide only failed on 0 problems.
bitwuzla only failed on 0 problems.
both bitwuzla and bv_decide failed on 0 problems.
In total, bitwuzla saw 0 problems.
In total, bv_decide saw 0 problems.
ran rg 'LeanSAT provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla provided a counter' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'LeanSAT proved' | wc -l, this file found 0, rg found 0, SUCCESS
ran rg 'Bitwuzla proved' | wc -l, this file found 0, rg found 0, SUCCESS
The InstCombine benchmark contains 4520 theorems in total.
Saved dataframe at: /home/runner/work/lean-mlir/lean-mlir/bv-evaluation/raw-data/InstCombine/instcombine_ceg_data.csv
all_files_solved_bitwuzla_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_rw_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_bb_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_sat_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratt_times_stddev avg: nan | stddev: nan
all_files_solved_bv_decide_lratc_times_stddev avg: nan | stddev: nan
mean of percentage stddev/av: nan%

bollu added 3 commits September 24, 2025 08:34

chore: change checkout action to be the correct namespace action

1886069

chore: fix URL

4cd5312

chore; target master

273f104

bollu added 4 commits September 24, 2025 09:20

chore: add NS evaluation running script

4206abb

chore: not sure where checkout clones stuff

bff05b0

chore: curl uv

704cfa3

chore: use run-on-host to build with docker

86c9002

chore: ignore snakemake

d8dc3a0

chore: fix paths

cd6b321

bollu added 2 commits September 26, 2025 10:35

chore: update

ba8e196

chore: fix yaml

dec9368

bollu changed the title ~~feat: Blase evaluation via Namespace~~ feat: Blase NS multi-machine batch Sep 26, 2025

bollu added 5 commits September 26, 2025 10:42

chore: fix yaml 2

f931e69

chore: fix yaml 3

d288737

chore: remove other GH action files so that this PR only spends time …

b305539

…running the action I care about

chore: remove other GH action files so that this PR only spends time …

b1eae4f

…running the action I care about

Merge remote-tracking branch 'origin/main' into blase-ns-evaluation

5d1cb00

bollu added 13 commits September 26, 2025 11:14

chore: rename to cluster-scale eval to be clear

374c42c

chore: move lake cache to before

chore: lake build before cache lake

51a74a7

chore: run lake cache before calling lake build, d'oh

a752449

chore: cache venv, fix sed

72701bc

chore: fix upload

2762848

chore: fix version of download action

c2721fe

chore: gotta figure paths out

45e2bfc

chore: pass nbatch

c2ad308

chore: switch to using full config, not tiny config

ad783ae

chore: add comment on how this can be improved

73c6426

Merge remote-tracking branch 'origin/main' into blase-ns-evaluation

3485d61

chore: update runner

06076a9

chore: bring back stuff

f0c7795

chore: add cluster runner

a8169bc

chore: add abstracter

771292e

feat: Blase NS multi-machine batch #1690

Are you sure you want to change the base?

feat: Blase NS multi-machine batch #1690

Uh oh!

Conversation

bollu commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

bollu commented Oct 7, 2025

Uh oh!

bollu commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant