This repository will guide you through the steps to set up your development environment for learning Spark, PySpark, and related technologies. Please follow the instructions below.
- Navigate to the
install-setup-docker
folder. - Follow the instructions inside to install Docker Desktop on your system.
- Docker is required for running containers for your PySpark environment.
- Make sure Docker is properly installed and running before proceeding to the next step.
-
After installing Docker, navigate to the
pyspark-jupyter-lab
folder. -
Use the Dockerfile provided in the folder to create a Docker container for running PySpark in Jupyter Lab.
-
The technical lectures on Spark Core, Spark DataFrames, SparkSQL, and DataFrames will be conducted in the containerized environment.
- Instructions to build the docker container are in the README file in the
pyspark-jupyter-lab
folder.
- Instructions to build the docker container are in the README file in the
-
Once the container is running, open your browser and go to your logs in the Docker container use the provided host link with the token to access the Jupyter Lab environment where you'll be working with PySpark.
-
There is a notes folder with lecture notes on RDD, DataFrames, Datasets and SparkSQL
- Refer to the README file
README(spark-streaming)
in thespark-structured-streaming
folder to build the docker image and run the container - There is notes folder with lecture notes
README(lab-one)
,README(lab-two-streaming)
andREADME(practice-three-streaming)
have outlined steps for your non-graded practice on structured streaming. They are not labs to be submitted but they are mandatory practice.
- Refer to the README file in the
gcp-spark-jupyter-setup
folder
- Navigate to the
aws-spark-setup
folder. - Follow the instructions in the README file to set up Apache Spark on a Hadoop cluster running in an EC2 instance on AWS.
- This setup will allow you to execute distributed Spark jobs on a live cluster.
- Follow the instructions in the README files. First,
sparkAWSREADME
, thensparkAWSREADME(cont)
.
If you encounter any issues or have questions, feel free to reach out in the course discussion forum. Happy learning!