Code to make it easy to install an EnsEMBL webserver on a fresh install of Ubuntu 14.04. The scripts in this repository will fetch dependencies and configure a local mirror of Ensembl/EnsemblGenomes with any combination of existing species using entirely remotely hosted data for minimum footprint, entirely locally hosted data for maximum performance or anywhere in between.
This is a sister project to easy-import, which simplifies the import of genomic data for any species from standard flat files into the Ensembl database schema. The latest and most complete documentation for both projects is available at easy-import.readme.io
These instructions will get you started with an Ensembl mirror of human and mouse using locally hosted core databases with the remaining data loaded from the ensembl public mysql servers.
This is the only step that requires sudo. If you wish to run the subsequent
steps as a different user, add a WEB_USER_NAME and WEB_USER_PASS to
the ini file to create this user and transfer ownership of the
SERVER_ROOT directory
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install git
cd ~
git clone https://github.com/lepbase/easy-mirror em
cd em
sudo ./install-dependencies.sh example.iniAt least one local database must be created with write access.
These instructions assume that both the webserver and database are on
localhost. Use of separate hosts is supported but will require changes to
/etc/mysql/my.cnf to allow external connections.
./setup-databases.sh databases.iniThis step fetches/updates the ensembl code repositories and sets up
configuration files in $SERVER_ROOT/public-plugins/mirror/conf.
./update-ensembl-code.sh example.iniThe last step starts the webserver and, if necessary, restarts it up to 5 times.
Usually this will be enough but sometimes you may need to run this script
again before your Ensembl mirror site becomes available at
http://localhost:$HTTP_PORT/
./reload-ensembl-site.sh my.iniTo set up an ensembl genomes mirror with four locally hosted Lepidopteran
species simply use the provided eg.ini file in place of example.ini
and eg-databases.ini in place of databases.ini. You will need to run
steps 2 and 3 again after any changes to the database locations.
Provided the relevant dumps are available at ftp://ftp.ensembl.org/pub/ or
ftp://ftp.ensemblgenomes.org/pub/ any database on the Ensembl sites can be
specified in a databases.ini file to be hosted locally.
using databases-extra.ini or eg-databases-extra.ini in step 2 will
fetch more for local hosting by using the SPECIES_DB_AUTO_EXPAND variable
to list database types to attempt to retrieve in addition to the core
database, or listing additional databases (e.g. compara) to host locally.
Using separate webserver and database hosts is supported by changing the
ENSEMBL_WEBSITE_HOST variable in databases.ini to something other than
localhost, however you will need to update your /etc/mysql/my.cnf file
to allow database connections from another server. Leaving the
ENSEMBL_WEBSITE_HOST variable empty will set up users allowed to connect
from any host.
Configuration options for steps 1, 3 and 4.
Four subsections with DB_[*_]HOST, DB_[*_]PORT, DB_[*_]USER and
DB_[*_]PASS variables specify connection settings for:
DB_HOSTetc. - the primary database host with species/multi-species databases.DB_SESSION_HOSTetc. - user-specific information, typically the only database to require read-write access and therefore a password protected connection.DB_FALLBACK_HOSTetc. - to reduce the amount of locally hosted data, it is often desirable to use alternate sources for some databases, theDB_FALLBACK_HOSThost will be queried to find any required databases that are not available onDB_HOSTDB_FALLBACK2_HOSTetc. - especially with EnsemblGenomes sites, remote databases may be found on more than one host, theDB_FALLBACK2_HOSThost will be queried to find any required databases that are not available onDB_HOSTorDB_FALLBACK_HOST
To set up a non-admin user to run steps 2, 3 and 4, specify WEB_USER_NAME
and WEB_USER_PASS to create a new user with ownership of the
SERVER_ROOT directory
Connection/branch information for the Github repositories to be cloned
ENSEMBL_URL/ENSEMBL_BRANCH- Ensembl codeEG_URL/EG_BRANCH- (optional) EnsemblGenomes codeBIOPERL_URL/BIOPERL_BRANCH- BioPerl code
HTTP_PORT- port to run the apache webserver on (reload-ensembl-site.sh) will need to be run with root privileges if this is set to a value below 1024SERVER_ROOT- the directory into which all ensembl code will be cloned and from which the site will be run
Database names to set up config files for/connect to
SPECIES_DBS- a space separated list of ensembl core dbs in square bracesSPECIES_DB_AUTO_EXPAND- to save listing all dbs for a given species this variable may be used to specify a set of replacement strings to attempt to connect to (e.g. specifySPECIES_DBS = [ homo_sapiens_core_84_38 ]andSPECIES_DB_AUTO_EXPAND = [ variation ]to also load the databasehomo_sapiens_variation_84_38, if it exists onDB_HOSTor aDB_FALLBACK_HOSTMULTI_DBS- a space separated list of multispecies databases in square braces
configuration options for step 2.
Root user connection details and user names (and passwords) for database users to be created
The name of the ENSEMBL_WEBSITE_HOST host (on which steps 1, 3 and 4 are
run) is used when setting up the database users. If this is anything other
than localhost then changes will be required to /etc/mysql/my.cnf to
support external connections
Locations and names of database dumps to fetch and load locally.
ENSEMBL_DB_URL- the URL containing the Ensembl database dumpsENSEMBL_DB_REPLACE- a flag to specify whether to overwrite databases that already exist on theDB_HOSTENSEMBL_DBS- a space separated list of database dump names in square braces.ensembl_accountsis required, all others are optional
The equivalent variables may be set for EG_DB_URL to fetch and download
EnsemblGenomes database dumps and for MISC_DB_URL to support situations
where the required databases are spread across multiple hosts.
An additional variable may be set for species databases,
SPECIES_DB_AUTO_EXPAND - a space separated list of database types to use
as replacement strings for core to facilitate downloading multiple
database types for each species in SPECIES_DBS