This repository contains Python scripts for analyzing temperature data obtained from the Digital Urban Climate Twin (DUCT) simulator as well as some ancillary files read by these scripts as input.
The main analysis code is in the 'analysis' class in the analysis.py file. This class contains numerous functions for processing the temperature data.
First, the input data for the user-specified run and time is read from a file. The input data can be in the form of a Pandas dataframe stored in the '.pkl' format or as a raster in the '.tif' format. Next, this data is processed to perform various types of calculations such as computing the average, minimum and/or maximum temperature within a user-specified region. The data can then be visualized by plotting the spatial temperature distribution as a contour plot or by plotting the average temperature within a user-specified region over the course of the simulation time.
The 'analysis' class also makes use of the DBSCAN clustering algorithm implemented in the scikit-learn library to identify clusters of locations whose temperature is larger than a user-specified threshold at a user-specified time. This algorithm is favoured over other clustering algorithms such as k-means because it accepts the maximum distance between two points in order for them to be considered as neighbors as an input parameter. For this particular use case, it is known that the maximum distance between neighbouring points should be around 300 meters since this is the grid resolution of the raster data output from the DUCT simulator. While the agglomerative clustering algorithm does allow specification of this threshold distance as an input parameter, it also requires the user to specify the number of clusters, which is not known in advance for this particular use case.