Skip to content

Framework based on a vector dabase to store, manage and curate large image datasets

License

Notifications You must be signed in to change notification settings

Photoroom/dataroom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataRoom

Screenshot of DataRoom UI

Tests

DataRoom is a high-performance AI training data management platform featuring a beautiful UI, multimodal support (images, latents, masks), similarity search, and a Python client for seamless integration.

To try it out, follow the guide below. Also check out the Python client inside dataroom_client and the examples in notebooks.

Getting Started

The simplest and fastest way to get a Dataroom stack up and running is to use Docker. If you prefer to run it without Docker, see section Setup without Docker.

cp .env.example .env
cp backend/config/settings/local.example.py backend/config/settings/local.py

Build and Start Services

The following command builds and starts the Django, Postgres and OpenSearch containers:

docker compose up -d --build

Collect Static Files

The static files are built as part of the Django docker. To collect them, we run:

docker compose run --rm dataroom_django python manage.py collectstatic --link --clear --noinput

Run Database Migrations

docker compose run --rm dataroom_django python manage.py migrate

Setup OpenSearch Indices

docker compose run --rm dataroom_django python manage.py setup_opensearch --confirm

Create admin user

docker compose run --rm dataroom_django python manage.py createsuperuser --noinput --email [email protected]

Access the application

Go to http://localhost:8000 and login with [email protected] / admin

Quick overview:

Local Development

For active frontend development with instant Hot Module Replacement (HMR):

Prerequisites

  • Node.js 22.14.0: nvm use 22.14.0
  • npm 10.9.2

Setup

1. Install Node.js version from .nvmrc:

nvm install && nvm use

2. Install frontend dependencies:

npm install

3. Enable development mode in Django settings.

Update backend/config/settings/local.py:

# FRONTEND
# ------------------------------------------------------------------------------
DJANGO_VITE_DEV_MODE = True

4. Start frontend dev server locally:

npm run dev  # Runs on port 3000

# For port conflicts, override the port:
npm run dev -- --port 3001  # or any available port

5. Rebuild and start Django, Postgres and OpenSearch containers:

docker compose up -d --build

See Backend setup if you like to run the Django backend locally without Docker.

Run Tests

Run all the backend tests inside of the Django docker:

docker compose run --rm dataroom_django pytest

Pre-commit Hooks

Please install the pre-commit hooks for maintaining code quality:

pre-commit install --hook-type pre-commit

Other Useful Commands

Run production server:

docker compose run --service-ports dataroom_django ./scripts/run_web.sh

Run production background tasks:

docker compose run --service-ports dataroom_django ./scripts/run_tasks.sh

Reset database:

docker compose exec dataroom_postgres bash -c "su postgres -c 'dropdb dataroom && createdb dataroom'"

View logs:

docker compose logs -f dataroom_django

Restart specific service:

docker compose restart opensearch

Static files in production

  • The entries in rollupOptions inside vite.config.js define which entry points are going to be built.
  • Anything inside /frontend/public/ will simply be copied over. Use this for images included in the HTML.
  • Running npm run build builds and bundles the frontend, generating a manifest.json.
  • Built files are now ready in /backend/static_built/.
  • Running python manage.py collectstatic collects the static files, runs whitenoise, compressing and adding a hash to the filename.
  • Final static files are now ready to be served from /backend/static_collected/.

Setup without Docker

If you prefer to run the project on MacOS without Docker, follow these steps.

Prerequisites

Install these prerequisites:

To use homebrew's openssl and snappy, add the following to your .zshrc:

export LDFLAGS="-L/opt/homebrew/opt/openssl@3/lib -L/opt/homebrew/Cellar/snappy/1.1.10/lib"
export CPPFLAGS="-I/opt/homebrew/opt/openssl@3/include -I/opt/homebrew/Cellar/snappy/1.1.10/include"

Database setup

Create the database:

createdb dataroom

Run OpenSearch:

docker compose up opensearch
python manage.py setup_opensearch

Backend setup

Use the correct python version from .python-version:

brew install pyenv
pyenv init
pyenv install
pyenv local

To create a virtualenv, inside the root project folder, run:

virtualenv .venv

To install all python requirements:

pip install poetry==1.7.1
poetry install

Copy and enable local settings:

cp backend/config/settings/local.example.py backend/config/settings/local.py

Remember to update the DATABASES settings in backend/config/settings/local.py to match your local database.

After setting up frontend, build the static files once:

npm run build

Collect the static files:

python manage.py collectstatic --link --clear --noinput

About

Framework based on a vector dabase to store, manage and curate large image datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •