Skip to content

eurostat/multimno

Repository files navigation

MultiMNO

This repository contains code that processes MNO Data to generate population and mobility insights indicators using the Spark framework.

πŸ“’ Description

This repository contains a python application that uses the PySpark library to process Big Data pipelines of MNO Data and generate multiple stadistical products related to mobility and sociodemographic analysis.

The code stored in this repository is aimed to be executed in a PySpark compatible cluster and to be deployed in cloud environments like AWS, GCP or Azure. Nevertheless, the code can be launched in local environments using a single node Spark configuration once all the required libraries have been correctly set.

For an easy deployment in local environments, configuration for creating a docker container with all the setup done is provided in this repository.

πŸ—ƒοΈ Repository Structure

The repository contains the following directories:

Directory Type Description
.devcontainer #ac8e03 Development Directory with config files for setting up a dev-environment using Dev-Containers.
.vscode #ac8e03 Development Directory containing config files for developers using VsCode.
docs #7030a0 Technical-Documentation Documentation source files that will be used for the documentation site. Mainly markdown files.
license #7030a0 Technical-Documentation Directory containing the Software Bill of Materials (SBOM) and associated licensing documentation for software dependencies.
multimno #7030a0 Open-source-software Main directory of the repository. It contains the Python source code of the application.
pipe_configs #0070c0 Data Directory containing examples of configuration files for the execution of the pipeline.
sample_data #0070c0 Data Directory containing Synthetic MNO-Data to be used to test the software.
resources #7030a0 Open-source-software Directory containing requirements files and development related configuration and script files.
tests #7030a0 Testing Directory containing test code and test files for the testing execution.

πŸ“„ Documentation

The multimno documentation is divided into two main documents.

πŸ““ User Manual

A user manual is provided composed of three sections:

  • Configuration: Section containing the explanation of all the configuration files used by the software.
  • Setup Guide: How to prepare the system for the software execution.
  • Execution Guide: How to execute the software.

🀝 Contribute

Please follow the contribute guide to see the rules and guidelines on how to contribute to the multimno repository.

πŸ–₯️ Developement Guidelines

Please follow the development guidelines to setup a dev-environment and see the recommended best practices for development, testing and documentation.

πŸ“œ Licensing

Multimno software is licensed under the European Union Public License (EUPL) 1.2 as declared by its LICENSE file. To ensure transparency in its dependencies, a Software Bill of Materials (SBOM) is provided at license/sbom.json. This SBOM was generated on March 18, 2025, using CycloneDX.

Licensing Files

The SBOM is generated using CycloneDX, and an accompanying Python script is provided to facilitate further analysis. This script processes the SBOM file and produces the licenses.csv file and the unique_licenses.txt. All license related files are stored at the licenses/ directory, containing:

  • licenses.csv – A concise CSV file containing the list of dependencies, including their versions and associated licenses.
  • sbom.json – A comprehensive SBOM detailing all dependencies and their respective licenses.
  • unique_licenses.txt – A file enumerating the distinct licenses used within the software.

As of March 18, 2025, the licensing data has been included in the repository and validated against the EUPL matrix of compatible open-source licenses.

πŸ…ΏοΈ Pipeline

The pipeline of Big Data processing performed by the software can be found at the following document: MultiMNO Pipeline

πŸ› οΈ Mandatory Requirements

Please verify that your system fullfils the System Requirements. in order to assert that your system can execute the code.

πŸ“¦ Synthetic data

MNO synthetic data is given in the repository so users can try a full pipeline locally in their computer. The data is located at sample_data/lakehouse/bronze directory. This data has been generated synthetically and contains the following specs:

  • 🌍 Spatial scope: All data has been generated in a bounding box that covers the metropolitan area of Madrid. The bounding box parameters are as follows:

    • latitude_min = 40.352
    • latitude_max = 40.486
    • longitude_min = -3.751
    • longitude_max = -3.579
  • πŸ“† Temporal scope : Data has been generated for 9 days, from 2023-01-01 to 2023-01-09 both included.

  • πŸšΆβ€β™‚οΈUsers: 100 different users.

  • πŸ“‘Network: 500 different cells.

The repository contains the code for generating synthetic data as multimno components so they can be launched as any other component in the pipeline. Configuration files can be found at: /opt/app/pipe_configs/configurations/00_synthetic

🏁 Quickstart

Setup

Use the following commands for a fast setup of an execution environment using docker.

Please check the Setup Guide for a more indepth detail of the system setup to execute the code.

Build docker image

docker build -t multimno:1.0-prod --target=multimno-prod .

Execution

Run an example pipeline within a container:

docker run --rm --name=multimno-container -v "${PWD}/sample_data:/opt/data" -v "${PWD}/pipe_configs:/opt/app/pipe_configs" multimno:1.0-prod pipe_configs/pipelines/pipeline.json

This command will:

  • Create a docker container.
  • mount the sample_data directory in /opt/data within the container.
  • mount the pipe_configs directory in /opt/app/pipe_configs within the container.
  • Execute a pipeline stored in /opt/app/pipe_configs/pipelines/pipeline.json within the container. This is the same file as the one in the repository.
  • Delete the container once the execution finishes.

NOTE: It is necessary to adjusts paths in the pipeline.json and in the general_configuration.ini file if the destination paths are altered.

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages