Within this project, we would like to deploy a pipeline to continuously detect vehicles (car, truck, bus) within video frames sent from simulated Unmanned Aerial Vehicles (UAVs).
Each UAV, as producer, continuously captures and sends video frames to Kafka Streaming. As this concept requires to process a huge amount of data and address real-time challenges, we utilize Spark Distributed Deep Learning for detecting bounding boxes of vehicles and Spark Structured Streaming to read and write stream data. The end point of data is Kafka output sink, and we use consumers to load that stream data out.
We decided to use UAVDT Benchmark dataset for our simulation and training stage. The distribution of the dataset is shown below.
We have chosen YOLO (You Only Look Once) as the primary solution for our approach due to the system's need for both real -time constraints and high accuracy, with a higher priority on the former. Specifically, we are utilizing YOLOv5n for our detection tasks. Additionally, we have included other YOLO models, such as YOLOv8s and YOLOv8n, to enable a more comprehensive and accurate comparison.
For the weather generator, we used CycleGan at here
The repo is developed on Ubuntu 22.04. Make sure to get the compatible packages and prerequisites if you try to run this on Windows. Also, be careful with OS-related errors or warnings.
- Python 3.10
- Spark 3.5.1
- Kafka 3.7.1 - 2.12(Scala)
We configured 5 partitions for Kafka. Write down the path to Kafka for later setup.
1. Install packages and setup environment
- You can replace command lines within
start.shto below code and usebash start.shto start the zookeeper and kafka server
/path/to/your/kafka/bin/zookeeper-server-start.sh /path/to/your/kafka/config/zookeeper.properties |
/path/to/your/kafka/bin/kafka-server-start.sh /path/to/your/kafka/config/server.properties- Try to use virtual env in Python for better development stage.
# Install python libraries and dependencies
pip install -r requirements.txt
# Create a directory to store the data
mkdir data- Download UAVDT Benchmark dataset here. Put
it within the created
data/folder, then extract it out.
2. Start files
- Firstly, run the
main/app/streaming_app.pyfile to start Flask app
python main/app/streaming_app.py- Secondly, run the
main/spark_streaming.pyfile to start Spark Structured Streaming query writer
python main/spark_streaming.py- Finally, run the 3 producers within
main/producers. We wanted to simulate weather rainy situation inproducer_3.py. You can try it by runningWeather_Effect_Generator/CyclicGAN.pywith the configurableset_weatherandfolder_weathervariables in_constants.py, then change theimage_folderin it tofolder_weather.
- http://localhost:5000: Flask app
- http://localhost:4040: Spark UI



