Make sure that the correct host is selected in the configuration file. Also, provide the data path for Movies Title, Actors Details, and Principals in the configuration or place the files in already created folders.
The solution is containerized and automated using Dockers and Make. Use following commands to run the solution.
First, build and start the containers:
make buildExecute the solution:
make runExecute test cases:
make run-testcasesPrune/Delete containers:
make stopExecution time of the solution is aproximately 02 mins with 04 cores and 12G of memory.
The distribution graph is stored in resources/graph
To run the solution locally, open config file, comment out line 07 and remove the comment character from line 08.
python runner.py --Remote FalseA top-down approach is used in combination with divide-and-conquer. First, the larger data frame is selected and reduced using the filters, then the calculations are performed on the reduced data to achieve better performance.
For more samples of my work, please visit GitHub
Email :[email protected]