TraceLinking Implementations with their Verified Designs (Evaluation)

This artifact combines the dataset from evaluating TraceLink, as well as scripts for processing that data. The packaged data is reported and discussed in our publication "TraceLinking Implementations with their Verified Designs", presented at OOPSLA 2025.

*WARNING: this artifact will consume up to 400GB of disk space.

The scripts in this artifact reproduce part of Table 1, all of Table 3, and all of Table 4 in the paper. Table 2 is a qualitative summary. All other data presented was manually collected, and the two .ods files contain all recorded working for summary values we present. See the "Inventory" section for a more detailed summary.

Note that this artifact includes a compiled .jar of for commit c376f7c4ea3e1537bdb4e2c6584dfbf614a54e52 of the PGo compiler (with TraceLink features). The full source code, including of that specific commit, is available at https://github.com/distCompiler/pgo.

Hardware Dependencies

As far as possible, this artifact is entirely runnable as .jar files and portable Scala scripts, which may run on any CPU architecture for which a Java Virtual Machine is available. That said, our evaluation environment was an Ubuntu 24.04 server running on an x86_64 processor.

Getting Started

The .devcontainer/ folder specifies a Docker environment, which can be automatically opened using the Visual Studio Code Dev Containers extension, or other compatible tools. All documentation assumes this environment.

To access the image using Docker Compose instead, run the following command at the artifact root:

docker compose run --user $(id -u) --rm tracelink-artifact

This will provide a shell with all dependencies installed.

To use something else, the Dockerfile identifies the necessary dependencies (a recent JVM such as OpenJDK 21, and the scala-cli tool).

Running the commands below will regenerate all final data from our study in a matter of minutes, but it relies on precomputed files included in the artifact. Forward links to the next section are included where relevant.

For a basic functionality test, run this script:

scala-cli run . --main-class process_data_sc

It will recreate 2 .csv files, bug_metrics.csv and validation_metrics.csv. validation_metrics.csv contains the "events per trace" column of Table 1. bug_metrics.csv is Table 3 in the paper. With no additional configuration, it will not regenerate all_data.csv. To regenerate that file (WARNING: takes multiple hours), see the next section.

Additionally, you can run this command:

scala-cli run . --main-class compression_ratio_sc

It will recreate compression_ratio.csv, which is Table 4 in the paper.

A 3rd script can be run with no effect:

scala-cli run . --main-class experiments_sc

It should output log consisting only of "skipping" entries, indicating that it found results for all the experiments already present and did nothing. To re-run these experiments (WARNING: takes a week of compute time), see the next section.

How to Fully Reproduce (Step by Step Guide)

The scripts in the previous section rely on precomputed data bundled within the artifact. This is a guide on how to recompute that data.

WARNING: these scripts can take between hours and days to run. See the corresponding heading of a time estimate.

all_data.csv (needs multiple hours)

Given the true dataset, which is in the systems/ subfolder, all_data.csv holds information that was parsed from the .txt files there via a slow process.

To regenerate all_data.csv, delete or rename the original file, and then run this:

scala-cli run . --main-class process_data_sc

It will launch with no feedback and run for hours. To see its progress, tail the new all_data.csv file that it is progressively writing. When it is done, you should have a similar CSV. It may not be identical, because this script uses multithreading to compute independent rows in parallel. Rows may be in different orders, but they should be the same. Re-running this script should not affect experimental results.

experiments.sc (needs 5+ days compute time)

This file actually runs the performance benchmarks. If you want to keep the original evaluation data, do not rerun it. Unlike other scripts which can be run on any machine, the results include execution times, and all data it records must come from the same (or identical) otherwise-unloaded machine. If you rerun this script, make sure to rerun the previous steps as well.

In all cases, the command is:

scala-cli run . --main-class experiments_sc

If you run it on an unchanged copy of the artifact, it will output a long list of files that it is skipping and then exit. This sanity test is the expected behavior in "Getting Started".

To have it do any work, one of the listed files has to not exist. The easiest way to do this without disrupting the data is to take a folder, like systems/raftkvs/traces_found_1/1223752761977714937, and duplicate it with a name like systems/raftkvs/traces_found_1/dummy. In the duplicate folder, delete all files that do not end in .log. If you then launch the script, it will re-run all experiments for, in this case, a recorded 1-node RaftKVS execution trace. If you try this, make sure to delete the dummy subfolder once you have confirmed things are working, or you will have added measurements to the dataset.

If you want to completely rerun the evaluation, delete all the contained .txt files and launch the script. It will run many variations of sometimes lengthy model checking tasks, and is expected to take 5+ days of non-stop processing, perhaps longer depending on your system.

Fortunately, as implied above, the system skips existing data, so it is possible to kill the process and then start it again. Do note, however, that killing the process may leave a partial file on disk. These will be flagged as errors by process_data.sc (due to missing information in the file), at which point you can delete the partial file and re-run this script to fill it out correctly.

To run the evaluation on multiple servers at once, copy the artifact onto multiple servers, and follow the commented instructions at the top of the recordResults function definition in experiments.sc. This will generate additional execution logs starting with 2_ and 3_, which will be picked up correctly by the analysis scripts, as long as they are eventually copied onto one directory tree (via scp, for example). We originally ran 3 parallel copies of the evaluation (the setup does not generalize to 4 or more).

Reusability Guide

This artifact's primary re-use goal is to help evaluate the verification-time performance of any implementation of TraceLink. Beyond that, it is designed to be as workload, system, and implementation agnostic as possible - while remaining a relatively simple set of scripts.

An alternate implementation of TraceLink can replace pgo.jar. In that case, removing all systems/**.txt files and re-running the experiments.sc script followed by process_data.sc will produce a new version of Table 3 in the paper (bug_metrics.csv). Once experiments.sc is run to completion, compression_ratio.sc will regenerate a new version of Table 4 from the paper (compression_ratio.csv).

New trace files can also be provided. As long as the systems/system_name/traces_folder/*.log structure is preserved, and the appropriate .tla files are present in systems/, it is possible to evaluate a completely new set of traces.

With some additional coding, more analytics routines can be added to process_data.sc in order to report other metrics about validation performance in a new .csv file. The preliminary output scraping is abstracted away and cached in all_data.csv, so it's possible to rerun the script while editing various statistics, without having to wait on pre-processing each time.

To add a new system or workload to the evaluation, the main change is to add more validation calls at the end of experiments.sc. It may also be necessary to edit some data-specific correctness assertions and field definitions in process_data.sc, which are based on specific expectations regarding how many data points should exist in the gathered data, as well as workload naming conventions.

Inventory

flowchart LR
    sys(("systems/**"))
    all_data_sc_1["all_data.sc"]
    all_data_sc_2["all_data.sc"]
    compression_ratio_sc["compression_ratio.sc"]
    all_data_csv("all_data.csv")
    bug_metrics("bug_metrics.csv (Table 3)")
    validation_metrics("validation_metrics.csv (Table 1, events per trace)")
    compression_ratio_csv("compression_ratio.csv (Table 4)")

    sys --> all_data_sc_1 --> all_data_csv
    all_data_csv --> all_data_sc_2
    all_data_sc_2 --> bug_metrics
    all_data_sc_2 --> validation_metrics

    sys --> compression_ratio_sc --> compression_ratio_csv

The packaged data lives in 2 places: the systems/ folder structure, and a summary all_data.csv.

This artifact vendors pgo.jar, a runnable build of the PGo compiler (https://github.com/distCompiler/pgo), which includes all TraceLink features, and an embedded version of the TLA+ tools. The specific TLA+ tools and community modules jars we used in our experiments are archived in the .tools/ subfolder.

As mentioned in the intro, the .devcontainer/ subfolder allows this artifact to be automatically opened in a Visual Studio Code Dev Environment containing suitable versions of all dependencies.

The following CSV files are available:

bug_metrics.csv matches Table 3 in the paper.
validation_metrics.csv summarizes model checking runtimes irrespective of whether a bug was found, which is the source of the "events per trace" column of Table 1 in the paper.
compression_ratio.csv matches Table 4 in the paper.
all_data.csv is a summary story of model checking metrics, which acts as a cache for the lengthy process of extracting data systems/.

The following scripts are available:

process_data.sc populates all_data.csv if necessary, and then generates bug_metrics.csv and validation_metrics.csv.
compression_ratio.sc compares the size of naive and compressed TLA+, generating compression_ratio.csv.
experiments.sc generates all the systems/**.txt logs, if they are missing, by re-running TraceLink on the stored .log files.
build_zip.sh rebuilds the artifact .zip file from this folder, including specifically the files intended for upload. This is the only Linux dependent part of the artifact, and it is not needed except for republishing. It requires the zip program and bash.
project.scala is not a script. It lists library dependencies shared by the scripts.

The following .ods spreadsheets are available, recording data that we collected manually:

tlc_tlink_overheads.ods compares having the TLC model checker trace one path through a model, with exploring a path of the same length using TraceLink. This was done by comparing manual re-runs of the original specification (with a depth limit to match log length) and TraceLink (already limited to an exact depth).
tracing_overheads.ods compares running a PGo-compiled system with TraceLink instrumentation enabled and disabled. This was done by re-running the same workloads as in the rest of the evaluation (which are integration tests with different node counts) and manually recording wall-clock execution time.

How to Regenerate Log Files

This artifact is built to work with a pre-gathered set of logs. To gather your own, check out a copy of the PGo compiler (https://github.com/distCompiler/pgo). The main branch is suitable for evaluation, and includes TraceLink. It also bundles the systems whose execution traces are stored in this artifact.

To gather traces, use PGo's harvest-traces command on any of that repository's systems/dqueue, systems/locksvc, or systems/raftkvs systems.

For example, to harvest executions from RaftKVS in a 3 node configuration, use this command at the PGo repository root (notice the traces_found_3 naming convention):

./pgo.sh harvest-traces --traces-needed 1 --traces-sub-folder traces_found_3 systems/raftkvs -- go test -run TestSafety_ThreeServers

This will automatically gather as many traces non-duplicate as requested (1 here). These traces will be stored in the indicated subfolder, at systems/raftkvs/traces_found_3/<unique identifier>/. At this stage, the data can be copied into this artifact's folder structure and processed.

There is no documented process to recreate traces of system bugs; please contact the authors if this is of interest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TraceLinking Implementations with their Verified Designs (Evaluation)

Hardware Dependencies

Getting Started

How to Fully Reproduce (Step by Step Guide)

all_data.csv (needs multiple hours)

experiments.sc (needs 5+ days compute time)

Reusability Guide

Inventory

How to Regenerate Log Files

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
.tools		.tools
systems		systems
LICENSE.txt		LICENSE.txt
README.md		README.md
all_data.csv		all_data.csv
bug_metrics.csv		bug_metrics.csv
build_zip.sh		build_zip.sh
compose.yml		compose.yml
compression_ratio.csv		compression_ratio.csv
compression_ratio.sc		compression_ratio.sc
experiments.sc		experiments.sc
pgo.jar		pgo.jar
process_data.sc		process_data.sc
project.scala		project.scala
tlc_tlink_overheads.ods		tlc_tlink_overheads.ods
tracing_overheads.ods		tracing_overheads.ods
validation_metrics.csv		validation_metrics.csv

License

DistCompiler/tracelink-artifact

Folders and files

Latest commit

History

Repository files navigation

TraceLinking Implementations with their Verified Designs (Evaluation)

Hardware Dependencies

Getting Started

How to Fully Reproduce (Step by Step Guide)

all_data.csv (needs multiple hours)

experiments.sc (needs 5+ days compute time)

Reusability Guide

Inventory

How to Regenerate Log Files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages