This repository provides a web application for displaying molecule library information. Originally developed for "kraken - Kolossal viRtual dAtabase for moleKular dEscriptors of orgaNophosphorus ligands", the platform has evolved into a flexible, multi-tenant system (one codebase serving multiple independent libraries) that can host multiple molecular databases with custom branding. You can see all available libraries at descriptor-libraries.molssi.org.
The app is built with FastAPI (a Python web framework) for the backend and React (a JavaScript UI library) for the frontend, deployed as Docker containers that connect to a PostgreSQL database with the RDKit extension for chemistry operations, containing molecule and conformer information.
A minimal demo version is available with 10 sample cyanoarene molecules to test the application locally. To start, clone the repository
-
Clone the repository
git clone https://github.com/Descriptor-Libraries/descriptor-libraries-framework.git cd descriptor-libraries-framework -
Start the demo:
docker compose up
-
Access the application:
- Web App: http://localhost/demo
- API Docs: http://localhost/api/demo/docs
- Traefik Dashboard: http://localhost:8080
- Database: localhost:5433 (user: postgres, password: postgres, db: demo_data)
-
Stop the application:
docker compose down
- 10 Cyanoarene Molecules: Representative sample with SMILES, 3D coordinates, PCA/UMAP projections
- DFT Properties: HOMO, LUMO, dipole moment, chemical hardness (η)
- Chemical Search: RDKit-powered substructure and similarity search
- Interactive UI: Browse, search, and visualize molecular data
The database automatically initializes with demo data on first run (the Postgres container creates demo_data and runs bundled init scripts to create the schema and load CSVs
- Backend: FastAPI on Uvicorn (high-performance Python web server), SQLAlchemy, Pydantic Settings (configuration management)
- Database: PostgreSQL with the RDKit extension
- Frontend: React + Vite (build tool) with MUI (Material-UI components), Plotly (interactive graphs), Ketcher (molecule editor), built to static files and served by a small Node server (
server.js) - Reverse proxy: Traefik v2 for HTTPS certificates and path-based routing (directing URLs to the right service)
- Depictions: CDK Depict service for converting SMILES notation to images
- Containers: Docker for packaging all services; Docker Compose for local/dev orchestration
The application supports multiple molecular libraries through a namespace-based architecture. Each library is deployed as a separate instance with its own namespace:
- URL Namespace: Each library gets its own URL path (e.g.,
/acids,/cyanoarenes,/demo) - API Namespace: Backend serves at
/api/<namespace>(e.g.,/api/acids/molecules) - Database Isolation: Each library has its own database (e.g.,
acids_data,demo_data) - Custom Branding: Library-specific logos, colors, and content
Libraries are configured through environment variables. The frontend uses VITE_BASE_URL=/<namespace> (like /demo), the backend uses API_PREFIX=<namespace> (like demo), and each gets its own database with POSTGRES_DB=<namespace>_data (like demo_data). This creates clean URL routing where frontends are served at /<namespace>, APIs at /api/<namespace>, and documentation at /api/<namespace>/docs.
The Docker Compose deployment consists of these services, each with specialized startup processes:
- reverse-proxy (Traefik): Entrypoint starts Traefik with dashboard and routes requests to services based on URL
- database (PostgreSQL + RDKit): Entrypoint initializes PostgreSQL; on first start creates the demo database and runs bundled SQL to set up tables and load demo data
- backend (FastAPI): Entrypoint runs
prestart.sh, which prepares the Python environment and starts the API with auto-reload - cdk-depict: Entrypoint starts the depiction service used to render molecule images
- frontend (React): Entrypoint runs
renameBase-dev.sh, which updates the app's base path (e.g.,/demo) and starts the dev server
The renameBase-dev.sh script is essential for multi-tenant support. It replaces the placeholder /base_url in generated files with the value of VITE_BASE_URL, allowing the frontend to work under namespace paths like /demo instead of only at the site root /. After adjusting links and asset paths, it starts the development server so routes and static assets resolve correctly behind the reverse proxy.
Running the Demo Application
The production deployment mirrors the demo stack but keeps the PostgreSQL database persistent and mounts a shared scratch directory for imports. Key differences:
-
Database image: Build the Postgres + RDKit image that lives in
database/Dockerfileand tag it (e.g.,docker build -t rdkit-postgres -f database/Dockerfile .). -
Persistent volumes: Instead of the demo
tmpfsdatabase, mount the production storage so data survives restarts:descriptor_database: image: rdkit-postgres shm_size: "4gb" volumes: - "/PATH/TO/PERSISTENT_DB:/var/lib/postgresql/data" - "/PATH/TO/SHARED_SCRATCH:/scratch"
The first volume keeps the Postgres data directory on durable storage (e.g.,
/mnt/largestore1/descriptor-libraries/descriptor_postgres). The/scratchmount makes large CSVs and schema dumps available inside the container for the import tooling (e.g.,/mnt/largestore1/janash/ccas). Adjust the host paths to match your environment. -
Backups and imports: Use the
data_import/import_dataset.pyhelper (with the YAML configs underdata_import/configs/) to load new libraries or refresh an existing one.
Other services (reverse proxy, backend, frontend, depiction) run as in the demo compose file. Ensure Traefik routes remain consistent with the namespaces you expose publicly.
Coming soon - Test suite updates for the namespace-aware architecture are in progress.
This project is a collaboration between The Molecular Sciences Software Institute and the Center for Computer Assisted Synthesis.
MolSSI is funded by the National Science Foundation OAC-1547580 and CHE-2136142. C-CAS is funded by the National Science Foundation CHE–2202693.