This project was part of the "Data Mining" course at the University of Trento.
A company is responsible for delivering different kinds of merchandise among different cities. In order to do such a thing, it hires drivers to whom assigns specific routes they should follow. For instance; go from Verona to Trento and deliver 10 apples, then go to Venezia and deliver 4 bananas and 3 pens, etc. The problem resides in the fact that the driver often adds and omits cities and merchandise at his will, which causes a considerable discrepancy between what they are told to do (standard routes), and what they actually do (actual routes). Our job is to come up with a solution to the following three tasks, so that the company can use it to minimize the above discrepancy:
- Propose better standard routes
- For each driver, sort the existing standard routes depending on his tastes
- Propose an ideal standard route for every driver
The routemining.py
file contains the functions needed to find the new standard routes, sort the routes for a driver, and find the best route for a driver.
This file can be imported in python and then the functions can be used.
findroutes(filename, limit_data=0, driver_id=0, prints=False)
It finds new standard routes based on data in a JSON file. The output is a list of recommended routes.
filename is the directory of the JSON file with the actual routes in it.
limit_data may be used to limit the amount of routes taken out of the JSON file, these routes are randomly sampled without replacement, if 0 it will not be limmited
driver_id can be specified when the data should be minimized to only the data that one specific driver drove.
prints can be specified to see the progress of the function
rankroutes(actual_routes_file, routes_to_sort_file, driver_id, limit_actual_routes=0, limit_routes_to_sort=0, prints=False, numperm3=128, findbest=0):
It sorts the routes in the routes_to_sort_file based on the actual_routes_file. The output is a list of ids of sorted routes based on preference limited to 5
actual_routes_file is the directory of the JSON file with the actual routes in it.
routes_to_sort_file is the directory of the JSON file with the routes in it that need to be sorted.
limit_actual_routes may be used to limit the amount of routes taken out of the actual routes JSON file, these routes are randomly sampled without replacement, if 0 it will not be limmited
limit_routes_to_sort may be used to limit the amount of routes taken out of the routes to sort JSON file, these routes are randomly sampled without replacement, if 0 it will not be limmited
driver_id can be specified when the data should be minimized to only the data that one specific driver drove.
prints can be specified to see the progress of the function
numperm3 specify the number of permuations
findbest=0 is 0 by default but when changed to 1 the routes_to_sort_file is required to be a list of routes rather than the filename pointing to it
findbestroute(actual_routes_file, driver_id, limit_data=0, prints=False, printsalot=False, numperm=128):
It gives the best route based on the data in the actual_routes_file The output is a single route that is considered as best
actual_routes_file is the directory of the JSON file with the actual routes in it.
limit_data may be used to limit the amount of routes taken out of the actual routes JSON file, these routes are randomly sampled without replacement, if 0 it will not be limmited
driver_id can be specified when the data should be minimized to only the data that one specific driver produced.
prints can be specified to see the progress of the function
printsalot can be specified to see more specifics of the function
numperm3 specify the number of permuations
createfiles(filename_actual, filename_sort, destination_file, tasknumber ,limit_actual_routes=0, prints=False, printsalot=False):
Creates the required file for a specific task number. The output is True
filename_actual is the directory of the JSON file with the actual routes in it.
filename_sort is the directory of the JSON file with the routes in it that need to be sorted. (only required for task 2)
destination_file is the location where the produced file should be stored
tasknumber
is the required task to be solved.
- Task 1 is generating a file with new standard routes.
- Task 2 is sorting routes that are in filename_sort based on filename_actual for every driver.
- Task 3 is gnerating a file with the perfect route for every driver
limit_actual_routes
may be used to limit the amount of routes taken out of the actual routes JSON file, these routes are randomly sampled without replacement, if 0 it will not be limmited.
prints
can be specified to see the progress of the function.
printsalot
can be specified to see more specifics of the function.