Spark, Jupyterlab, and other Data Science tooling via Docker Swarm
- Docker
- docker-compose
- Existing Docker registry for storing images
- Existing caching layer for .deb packages
Edit the .env file.
https://docs.docker.com/engine/swarm/swarm-mode/
docker swarm init --advertise-addr=192.168.1.113 --listen-addr=0.0.0.0Setup across all nodes using provided token and command
Nodes:
- 192.168.1.113 - Asus-Blue (master)
- 192.168.1.145 - Windows WSL2 (worker)
- 192.168.1.105 - Alienware (worker)
- 192.168.1.124 - Laptop (worker)
https://docs.docker.com/engine/swarm/manage-nodes/ Check the status of the swarm cluster
docker node lshttps://docs.docker.com/engine/swarm/stack-deploy/
Use existing Docker image registry
Add the following to your daemon.json Docker file:
"insecure-registries": ["192.168.1.226:5000"]
docker service lsDownload Spark and copy it to sparkmaster, sparkworker, and jupyterlab.
Download livy and copy it to sparkmaster.
Download Spark NLP .jar and copy it to sparkmaster, sparkworker, and jupyterlab.
Build and save the images on the local registry, then deploy:
./build-deploy.shCheck the status of the stack
docker stack ls
docker stack services spark
Get the full details
docker stack ps spark --no-truncOpen up the Spark web-ui
Open up the Jupyterlab web-ui
Look at workers as they execute
Look at the submitter
Look at Livy
Bring everything down
docker stack rm spark
docker swarm leave --forceDownload newer upstream versions by running ./build-deploy.sh
nload- for live network usagehtop- for live CPU and RAM usage