MediaBridge is a project being developed at the Noisebridge hackerspace in San Francisco, CA, USA. See also the Noisebridge homepage and the wiki entry for this project.
MediaBridge is in a very early stage of the development. It's intended functionality is to provide recommendations that bridge media types. So for example, you might say you're interested in the film Saw and MediaBrige might recommend the video game Silent Hill or a Stephen King book. For now, we are working on simply returning recommendations for movies, based on the Netflix Prize dataset.
Currently, we are only accepting contributions from members of the project who meet in person at Noisebridge.
This code requires Python 3.12.
To install the project Python dependencies, first install pipenv globally with pip install pipenv
. Then create a virtual env/install dependencies with pipenv install --dev
.
To install the frontend dependencies, cd into mediabridge-frontend
and run npm install
.
To run code in the pipenv virtual environment, prefix your command with pipenv run
(ex. pipenv run python
runs the python interpreter in the pipenv environment).
Once you have a new mediabridge environment set up, here are the first commands you should run:
pipenv run pre-commit install
-- does sanity checks on each commitpipenv run mb init
-- downloads 100 M ratings from the Netflix prize datasetpipenv run mb load
-- fills several indexed sqlite tables with the ratings data
To run Term Frequency - Inverse Document Frequency (TF-IDF) recommender:
pipenv run mb tf-idf "MOVIE_NAME_1" ?"MOVIE_NAME_2"... ?--options
You may find it convenient to work on the project in a linux docker container.
Please use pipenv run lint
before git pushes, to keep the codebase clean.
To fix import errors and other Intellisense features, make sure you've let VSCode know about your pipenv environment. To do that:
- Open the VSCode command palette (Control/Command+SHIFT+P)
- Search for and select the "Python: Select Interpreter" command
- Choose the option that starts with
MediaBridge
When debugging it's sometimes convenient to discard a venv and start from scratch.
pipenv --rm
# discards existing venv, if there is onepipenv lock --dev
# queries pypi.org to re-write the Pipenv.lock filepipenv install --dev
# installs the locked versions, and yes--dev
is needed
You may also find pipenv update --dev
useful, if a dep is down-rev.
For development purposes, you can simply run the CLI script:
pipenv run mb
Be sure to specify options such as -v and -l before any subcommands and their arguments (process, load, etc.).
NOTE: If you encounter a ModuleNotFoundError, make sure you are in the root directory of the project, as the mediabridge
directory is the module Pipenv is trying to reference.
This is currently just an alias to run the main script using pipenv run python -m mediabridge.main
, but this may change in the future, so using pipenv run mb
will ensure the correct script is always run.
The API server powers the frontend, and connects it to the backend recommendation code. To run it, use pipenv:
pipenv run mb serve
You will have to leave the shell where you ran that command running and open. You can see requests that go to the API server in the same window. If you visit http://localhost:5000
, you will see a hello world message.
Change directory into mediabridge-frontend
and run:
npm run dev
To run Python unit tests:
pipenv run test
These tests are also evaluated via a GitHub action when opening or updating a PR and must pass before merging.
To run Cypress (frontend/Typescript tests):
npx cypress open
We use ruff for code formatting, linting, and import sorting. If you've installed the project with the instructions above, you should have access to the ruff
binary.
The repo comes with a .vscode
directory that contains a recommended ruff extension, as well as settings to set ruff as your Python formatter and to format code and sort imports on save. If you're not using VSCode, you can run ruff format
from the project root directory to format all Python code.
There is a GitHub actions "check" for code formatting, which will fail if you have unformatted code in your PR.