Benchmarking Scripts

Basic Structure

The */benchmark/ folder is organized as follows:

Folder	Description
`common/`	common scripts sourced by other benchmarking scripts
`datasets/`	`load-file.sh` and format converters
`<system>/`	system-specific scripts (including Hadoop)
`parsers/`	log checker and log parser

File	Description
`init-all.sh`	initializes Hadoop, HDFS, and calls each system's `init.sh`
`bench-all.sh`	runs each system's `benchall.sh` script
`local-init.sh`	initializes Hadoop, HDFS, and all systems for local testing

Common Scripts

The scripts in */benchmark/common do the following:

File	Description
`bench-finish.sh`	completes a run by pulling logs from the worker machines (to the master) and terminating `sar`/`free` instances
`bench-init.sh`	starts a run by starting `sar`/`free` instances on all machines
`cleanup-bench.sh`	terminates `sar`/`free` instances on all machines (intended to be used manually, in the event `bench-finish.sh` fails to run
`get-configs.sh`	contains configuration settings for each of the systems: Giraph and GPS max JVM heap size, number of threads per worker, number of workers per machine
`get-dirs.sh`	specifies directories/locations of system folders
`ssh-check.sh`	checks SSH connectability to worker machines

For get-config.sh, if GIRAPH_XMX is updated, re-initialize and restart Hadoop to put those changes into effect (also done automatically with */benchmark/init-all.sh or */benchmark/local-init.sh). If MIZAN_WPM is changed, be sure to re-run Mizan's graph partitioner.

Note: Paths in get-dirs.sh are only used by the benchmarking scripts. Some systems still expect, e.g., paths in ~/.bashrc.

Dataset Scripts

See Dataset Scripts.

Hadoop Scripts

Hadoop has two scripts:

File	Description
`init.sh`	generates the required Hadoop config files (masters, slaves, core-site.xml, hdfs-site.xml, mapred-site.xml) and copies them to worker machines
`restart-hadoop.sh`	restarts Hadoop; specifying 1 as an argument will make it wait until HDFS is up (i.e., `hadoop dfsadmin -safemode wait`)

System-specific Scripts

All systems (giraph, gps, graphlab, mizan) have:

File	Description
`benchall.sh`	system-specific batch benching script (only for 4, 8, 16, 32, 64, 128 machines)
`<alg>.sh`	algorithm-specific benching script (where `<alg>` is pagerank, sssp, wcc, mst, or dimest)
`recompile-<system>.sh`	recompiles the system

Here, mst is DMST and dimest is diameter estimation. Look in benchall.sh for how a particular <alg>.sh script should be used (or try ./<alg>.sh to get a usage message). These scripts must be ran from their system directory: cd */benchmark/<system> before ./<alg>.sh!

In general, the <alg>.sh scripts require at least input-graph, the name of the input file (e.g., orkut-adj.txt), and machines, the number of slave/worker machines. For GraphLab, input-graph can be a directory (e.g., orkut-adj-split/) containing the split parts of an input file. (See also: Datasets.)

Note: Mizan's graph partitioner ("premizan") must be ran before running its <alg>.sh.

Each system also has its own additional scripts.

Giraph

File	Description
`prtolfinder.sh`	finds PageRank error tolerance used in GraphLab
`kill-java-job.sh`	cleans up lingering Java instances on worker machines

GPS

File	Description
`debug-site.sh`	starts GPS's debug monitoring webpage (`http://<public-ip>:4444/debugmonitoring.html`)
`disable-dimest-fix.sh`	disables a patch that fixes GPS to run with diameter estimation and recompiles GPS (patch is disabled by default)
`enable-dimest-fix.sh`	enables the dimest patch and recompiles GPS
`init.sh`	initializes GPS by creating `slaves` and uploading `machine.cfg` to HDFS
`start-nodes.sh`	starts GPS master and workers (replaces GPS's `*/gps-rev-110/master-scripts/start_gps_nodes.sh`)
`stop-nodes.sh`	stops GPS master and workers (replaces GPS's `*/gps-rev-110/master-scripts/stop_gps_nodes.sh`)

GraphLab

File	Description
`init.sh`	initializes GraphLab by creating `machines` file

Mizan

File	Description
`init.sh`	initializes Mizan by creating `slaves` file
`premizan.sh`	runs Mizan's separate prepartitioner (must be done before running any algorithms)

Log Files

The naming convention for log files is

<algorithm>_<graph>_<num-machines>_<mode>_<date-time>[_<machine-id>]_<stat>.txt

where

algorithm can be pagerank, sssp, wcc, mst, dimest, or premizan
graph is the full filename of the graph (e.g., orkut-mst-adj.txt)
mode is system-dependent (see below)
date-time is yyyymmdd-hhmmss (e.g., 20140501-123040)
stat can be time, mem, nbt, cpu, or net

When statistic is time, there is no machine-id. For all the other statistics, machine-id will range from 0 (the master) to the total number of slave/worker machines. Specifically, mem, cpu, and net are per-second memory/CPU/network usage from free and sar, while nbt is content from /proc/net/dev before and after a run.

mode is a system-specific digit specified as follows:

For Giraph, 0 is byte array, 1 is hashmap.
For GPS, 0 is none (no optimizations), 1 is LALP, 2 is dynamic migration, 3 is LALP + dynamic.
For GraphLab, 0 is synchronous, 1 is asynchronous.
For Mizan, 0 is static, 1 is delayed, 2 is mixed (the latter two do not work). Additionally, for premizan, 1 is hash partitioning while 2 is range partitioning.

Location and Parsing

All log files located in */benchmark/<system>/logs/.

Logs can be parsed "interactively" and collectively for raw data (for plotting).

Additionally, you can also check for missing logs caused by failed runs.

Our Results

Data and Paper

Running Experiments

Repo Structure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarking Scripts

Contents

Basic Structure

Common Scripts

Dataset Scripts

Hadoop Scripts

System-specific Scripts

Giraph

GPS

GraphLab

Mizan

Log Files

Location and Parsing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally