armada-spark

Run Apache Spark workloads seamlessly on Armada, a multi-cluster Kubernetes batch scheduler

Overview

armada-spark is an open-source integration designed to streamline deployment and management of Apache Spark workloads on Armada. It provides preconfigured Docker images, tooling for efficient image management, and example workflows to simplify local and production deployments.

Getting Started

Prerequisites

Java 8/11/17
Scala 2.12/2.13
Apache Maven 3.9.6+
(Optional) kind for local clusters
An accessible Armada Server and Lookout endpoint (check Armada Operator for the Quickstart guide)

Versions

By default, the project targets Spark 3.5.3 and Scala 2.13.15. To change versions:

./scripts/set-version.sh <spark-version> <scala-version>

Example:

./scripts/set-version.sh 3.5.3 2.13.15

Building Armada Spark

After setting your desired Spark and Scala versions, build the Armada Spark project with Maven by running the following command:

mvn clean package

Building Docker Images

Once your project is built, create the Docker image using:

./scripts/createImage.sh [-i image-name] [-m armada-master-url] [-q armada-queue] [-l armada-lookout-url]

Options:

Flag	Description	Example
`-i`	Docker image name	`spark:armada`
`-m`	Armada master URL	`armada://localhost:30002`
`-q`	Armada queue	`default`
`-l`	Armada Lookout URL	`http://localhost:30000`
`-p`	Include python
`-h`	Display help

To simplify, you may store these values in scripts/config.sh:

export IMAGE_NAME="spark:armada"
export ARMADA_MASTER="armada://localhost:30002"
export ARMADA_QUEUE="default"
export ARMADA_LOOKOUT_URL="http://localhost:30000"
export INCLUDE_PYTHON=true

Deployment

We recommend using kind for local testing. If you are using the Armada Operator Quickstart, it is already based on kind.

Run the following command to load the Armada Spark image into your local kind cluster:

kind load docker-image $IMAGE_NAME --name armada

Development

Before submitting a pull request, please ensure that your code adheres to the project's coding standards and passes all tests.

Testing

To run the unit tests, use the following command:

mvn test

To run the E2E tests, run Armada using the Operator Quickstart guide, then execute:

scripts/test-e2e.sh

Linting

To check the code for linting issues, use the following command:

mvn spotless:check

To automatically apply linting fixes, use:

mvn spotless:apply

E2E

Make sure that the SparkPi job successfully runs on your Armada cluster before submitting a pull request.

Running Example Workloads

SparkPi Example

The project includes a ready-to-use Spark job to test your setup:

./scripts/submitArmadaSpark.sh

This job leverages the same configuration parameters (ARMADA_MASTER, ARMADA_QUEUE, ARMADA_LOOKOUT_URL) as the scripts/config.sh script.

Use the -h option to see what other options are available.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
conf		conf
docker		docker
docs		docs
e2e		e2e
extraFiles		extraFiles
scripts		scripts
src		src
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

armada-spark

Overview

Getting Started

Prerequisites

Versions

Building Armada Spark

Building Docker Images

Deployment

Development

Testing

Linting

E2E

Running Example Workloads

SparkPi Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

armadaproject/armada-spark

Folders and files

Latest commit

History

Repository files navigation

armada-spark

Overview

Getting Started

Prerequisites

Versions

Building Armada Spark

Building Docker Images

Deployment

Development

Testing

Linting

E2E

Running Example Workloads

SparkPi Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages