This repo contains Docker Compose containers to run the MediaWiki software.
Clone the repo. Then create and start the containers:
cd docker-bugsigdb.org
docker compose up -d --no-start
# copy a database dump (*.sql or *.sql.gz) to the __initdb directory if needed
docker run --rm -v <images/directory>:/source -v <volume_prefix>_images:/target busybox cp -a /source/. /target/
# copy .env.example to .env and modify as needed (see the Settings section)
cp .env.example .env
docker compose up -d
Wait for the completion of the build and initialization process and access it via http://localhost:8081
in a browser.
Running docker compose up -d
will start the containers:
db
- MySQL official container, used as the database backend for MediaWiki.web
- Apache/MediaWiki container (Taqasta) with PHP 7.4 and MediaWiki 1.39.xredis
- Redis is an open-source key-value store used as the cache backendmatomo
- Matomo analytics instanceelasticsearch
- Advanced search enginevarnish
- A reverse caching proxy and HTTP acceleratorrestic
- (production only) Modern backup container performing incremental backups to both S3 storage and Google Cloud Storage (GCS)updateEFO
- (production only) A Python script that updates EFO links on glossary pages automatically
Settings can be adjusted via the .env
file created from .env.example
. Environment and other general configuration are in the compose.yml
and environment-specific overrides (compose.staging.yml
, compose.PRODUCTION.yml
) files, in the environment sections.
Additionally:
_resources
directory: contains favicon, logo, styles, and customizations for the chameleon skin and additional MediaWiki extensions._settings/LocalSettings.php
: contains settings for MediaWiki core and extensions. If customization is required, change them there.- For production backups with restic, create the file
./secrets/restic-GCS-account.json
, containing your Google Cloud Storage credentials.
The database used is the official MySQL 8 container.
The most important environment variable is MYSQL_ROOT_PASSWORD
; it specifies the password set for the MySQL root
superuser account.
If changed, ensure corresponding database passwords (MW_DB_PASS
in the web section) are updated accordingly.
MW_SITE_SERVER
configures $wgServer; set this to the server host and include the protocol likehttps://bugsigdb.org
MW_SITE_NAME
configures $wgSitenameMW_SITE_LANG
configures $wgLanguageCodeMW_DEFAULT_SKIN
configures $wgDefaultSkinMW_ENABLE_UPLOADS
configures $wgEnableUploadsMW_ADMIN_USER
configures the default administrator usernameMW_ADMIN_PASSWORD
configures the default administrator passwordMW_DB_NAME
specifies the database name MediaWiki usesMW_DB_USER
specifies the DB user MediaWiki uses; default isroot
MW_DB_PASS
specifies the DB user password; must match your MySQL passwordMW_PROXY_SERVERS
configures $wgSquidServers for reverse proxies (typicallyvarnish:80
)MW_MAIN_CACHE_TYPE
configures $wgMainCacheType. (CACHE_REDIS
is recommended)MW_LOAD_EXTENSIONS
provided as comma-separated list of MediaWiki extensions to load during container startupMW_LOAD_SKINS
comma-separated list of MediaWiki skins available for useMW_SEARCH_TYPE
configures the search backend (typicallyCirrusSearch
)MW_NCBI_TAXONOMY_API_KEY
,MW_RECAPTCHA_SITE_KEY
,MW_RECAPTCHA_SECRET_KEY
optional/requested third-party API keysMW_ENABLE_SITEMAP_GENERATOR
enables sitemap generator script on production (true/false
)
The restic container handles scheduled backups (weekly/monthly retention settings) through incremental snapshots:
RESTIC_PASSWORD
- password to encrypt backupAWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
- access credentials for S3-compatible storageBACKUP_CRON
,CHECK_CRON
- cron schedule for automatic backup and check operations
This Python-based container automatically updates EFO terms and links in the glossary:
UPDATE_EFO_BOT_PASSWORD
- authentication password for bot accountUPDATE_EFO_PAUSE
- update frequency in seconds (default 86400 sec / 24h)
Note: the script may produce extra load to the wiki so it's recommended to schedule it for nigh time, also worth to
consider that it takes time to process all the pages so average script cycle is ~4-8 hours. You can change sleep
timeouts via -z
parameter.
Matomo instance provides website analytics:
- Default admin username:
admin
MATOMO_PASSWORD
- sets the initial password for matomo administration panel
Varnish cache container used as reverse proxy and front-end cache server:
VARNISH_SIZE
- amount of RAM to dedicate to caching (e.g.,100m
)
BASIC_USERNAME
- basic http usernameBASIC_PASSWORD
- basic http password (hashed usingopenssl passwd -apr1
)
Used to just binding a certain directory or file from the host inside the container. We use:
./__initdb
directory is used to pass the database dump for stack initialization
Data that must be persistent across container life cycles are stored in docker volumes:
db_data
(MySQL databases and working directories, attached todb
service)elasticsearch_data
(Elasticsearch nodes, attached toelasticsearch
service)web_data
(Miscellaneous MediaWiki files and directories that must be persistent by design, attached toweb
service )images
(MediaWiki upload directory, attached toweb
service and used inrestic
service (read-only))redis_data
(Redis cache)varnish_data
(Varnish cache)matomo_data
(Analytics data)restic_data
(Space mounted to therestic
service for operations with snapshots)
Docker containers write files to volumes using internal users.
Log files are stored in the _logs
directory.
Make a full backup of the wiki, including both the database and the files. While the upgrade scripts are well-maintained and robust, things could still go awry.
cd <docker stack directory>
docker-compose exec db /bin/bash -c 'mysqldump --all-databases -uroot -p"$MYSQL_ROOT_PASSWORD" 2>/dev/null | gzip | base64 -w 0' | base64 -d > backup_$(date +"%Y%m%d_%H%M%S").sql.gz
docker-compose exec web /bin/bash -c 'tar -c $MW_VOLUME $MW_HOME/images 2>/dev/null | base64 -w 0' | base64 -d > backup_$(date +"%Y%m%d_%H%M%S").tar
For picking up the latest changes, stop, rebuild and start containers:
cd <docker stack directory>
git pull
docker-compose up -d
The upgrade process is fully automated and includes the launch of all necessary maintenance scripts.
The image is configured to automatically purge the homepage once per hour. You can configure this using the following environment variables:
MW_CACHE_PURGE_PAUSE=3600
MW_CACHE_PURGE_PAGE=Main_Page
The deployment is organized as follows:
compose.yml
: common container definitions, typically used in development environmentcompose.staging.yml
: staging-specific overrides (hostnames, basic auth)compose.PRODUCTION.yml
: production-specific overrides including health-checks, backups, and special scripts
Before running docker compose commands, link your environment configuration as follows:
ln -sf compose.staging.yml compose.override.yml # staging environment
# OR
ln -sf compose.PRODUCTION.yml compose.override.yml # production environment
Then use docker compose up -d
as usual. Docker Compose automatically merges the files.
To work around T333776 we run maintenance/updateSpecialPages.php once a day. This ensures the count of active users on Special:CreateAccount stays up to date.
- bugsigdb.org: A Comprehensive Database of Published Microbial Signatures
- BugSigDB issue tracker: Report bugs or feature requests for bugsigdb.org
- BugSigDBExports: Hourly data exports of bugsigdb.org
- Stable data releases: Periodic manually-reviewed stable data releses on Zenodo
- bugsigdbr: R/Bioconductor access to published microbial signatures from BugSigDB
- Curation issues: Report curation issues, requests studies to be added
- bugSigSimple: Simple analyses of BugSigDB data in R
- BugSigDBStats: Statistics and trends of BugSigDB
- BugSigDBPaper: Reproduces analyses of the Nature Biotechnology publication
- community-bioc Slack Team: Join #bugsigdb channel