Skip to content

TurakhiaLab/TWILIGHT

Repository files navigation

TWILIGHT: Tall and Wide Alignments at High Throughput

License install with bioconda Published in Bioinformatics

What's New

  • TWILIGHT v0.2.0
    • Add new sequences: Support for adding new sequences to an existing alignment.
    • Protein alignment: Support for protein alignment, still relatively new and continuously improving.
    • Iterative mode: Improved for faster performance.
    • Flexible tree support: Allows using a tree that contains more sequences than the actual dataset for alignment.
    • Bug fixes: Resolved issues present in TWILIGHT v0.1.4; users are encouraged to update to v0.2.0.

Table of Contents


Introduction

TWILIGHT (Tall and Wide Alignments at High Throughput) is a tool designed for ultrafast and ultralarge multiple sequence alignment. It is able to scale to millions of long nucleotide sequences (>10000 bases). TWILIGHT can run on CPU-only platforms (Linux/Mac) or take advantage of CUDA-capable GPUs for further acceleration.

By default, TWILIGHT requires an unaligned sequence file in FASTA format and an input guide tree in Newick format to generate the output alignment in FASTA format (Fig. 1a, default mode). When a guide tree is unavailable, TWILIGHT provides a Snakemake workflow to estimate guide trees using external tools (Fig 1b, iterative mode).

TWILIGHT adopts the progressive alignment algorithm (Fig. 1c) and employs tiling strategies to band alignments (Fig. 1e). Combined with a divide-and-conquer technique (Fig. 1a), a novel heuristic dealing with gappy columns (Fig. 1d) and support for GPU acceleration (Fig. 1f), TWILIGHT demonstrates exceptional speed and memory efficiency.

Figure 1: Overview of TWILIGHT alogorithm

Installation

Installation summary (choose your installation method)

TWILIGHT offers multiple installation methods for different platforms and hardware setups:

  • Conda is recommended for most users needing the default mode and partial iterative mode support, as some tree tools may be unavailable on certain platforms.
  • Install script is required for AMD GPU support.
  • Docker (built from the provided Dockerfile) is recommended for full support for iterative mode.
Platform / Setup Conda Script Docker
Linux (x86_64)
Linux (aarch64) 🟡
macOS (Intel Chip)
macOS (Apple Silicon) 🟡
NVIDIA GPU
AMD GPU

🟡 The Docker image is currently built for the linux/amd64 platform. While it can run on arm64 systems (e.g., Apple Silicon or Linux aarch64) via emulation, this may lead to reduced performance.

⚠️ To enable GPU support, the appropriate GPU drivers must be installed on the host system. This applies to all installation methods (Installation script, Conda, and Docker). The CUDA toolkits and libraries are included in Conda and Docker setups, but must be installed manually when using the installation script.

Using Conda

TWILIGHT is available on multiple platforms via Conda. See TWILIGHT Bioconda Page for details.

Step 1: Create and activate a Conda environment (ensure Conda is installed first)

conda create -n twilight python=3.11 -y
conda activate twilight
# Set up channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
# Install TWILIGHT
conda install bioconda::twilight

Step 2 (optional): Install TWILIGHT iterative mode

git clone https://github.com/TurakhiaLab/TWILIGHT.git
cd TWILIGHT
bash ./install/installIterative.sh

Using installation script (requires sudo access if certain common libraries are not already installed)

Users without sudo access are advised to install TWILIGHT via Conda or Docker.

Step 1: Clone the repository

git clone https://github.com/TurakhiaLab/TWILIGHT.git
cd TWILIGHT

Step 2: Install dependencies (requires sudo access)

TWILIGHT depends on the following common system libraries, which are typically pre-installed on most development environments:

- wget
- build-essential 
- cmake 
- libboost-all-dev 

It also requires libtbb-dev, which is not always pre-installed on all systems. For users who do not have sudo access and are missing only libtbb-dev, our script builds and installs TBB from source in the local user environment, with no sudo access required.

For Ubuntu users with sudo access, if any of the required libraries are missing, you can install them with:

sudo apt install -y wget build-essential libboost-all-dev cmake libtbb-dev

For Mac users, install dependencies using Homebrew:

xcode-select --install # if not already installed
brew install wget boost cmake tbb

Step 3: Build TWILIGHT

Our build script automatically detects the best available compute backend (CPU, NVIDIA GPU, or AMD GPU) and builds TWILIGHT accordingly. Alternatively, users can manually specify the desired target platform.

Automatic build:

bash ./install/buildTWILIGHT.sh

Build for a specific platform:

bash ./install/buildTWILIGHT.sh cuda # For NVIDIA GPUs
bash ./install/buildTWILIGHT.sh hip  # For AMD GPUs

Step 4: The TWILIGHT executable is located in the bin directory and can be run as follows:

cd bin
./twilight --help

Step 5 (optional) Install TWILIGHT iterative mode (ensure Conda is installed first)

# Create and activate a Conda environment 
conda create -n twilight python=3.11 -y
conda activate twilight
# Install Snakemake and tree inference tools
bash ./install/installIterative.sh

Using Dockerfile

The Dockerfile installed all the dependencies and tools for TWILIGHT default/iterative mode.

Step 1: Clone the repository

git clone https://github.com/TurakhiaLab/TWILIGHT.git
cd TWILIGHT

Step 2: Build a docker image (ensure Docker is installed first)

CPU version

cd docker/cpu
docker build -t twilight .

GPU version (using nvidia/cuda as base image)

cd docker/gpu
docker build -t twilight .

Step 3: Start and run docker container

CPU version

docker run --platform=linux/amd64 -it twilight

GPU version

docker run --platform=linux/amd64 --gpus all -it twilight

Step 4: Run TWILIGHT

cd bin
./twilight -h

Run TWILIGHT

TWILIGHT Command Line Interface

For more information about TWILIGHT's options and instructions, see wiki or Help.

cd bin
./twilight -h

Default Mode

Performs a standard progressive alignment using default configurations.

Usage syntax

./twilight -t <tree file> -i <sequence file> -o <output file>

Example

./twilight -t ../dataset/RNASim.nwk -i ../dataset/RNASim.fa -o RNASim.aln

Divide-and-Conquer Method

To reduce the CPU’s main memory usage, TWILIGHT divides tree into subtrees with at most m leaves, and align subtrees sequentially. The parameter m is user-defined.

Usage syntax

./twilight -t <tree file> -i <sequence file> -o <output file> -m <maximum subtree size>

Example

./twilight -t ../dataset/RNASim.nwk -i ../dataset/RNASim.fa -o RNASim.aln -m 200

Add New Sequences to an Existing Alignment

For better accuracy, it is recommended to use a tree that includes placements for the new sequences. If no tree is provided, TWILIGHT aligns new sequences to the profile of the entire backbone alignment, which may reduce accuracy. In this case, using the provided Snakemake workflow is advised.

./twilight -a <backbone alignment file> -i <new sequence file> -t <tree with placement of new sequences> -o <path to output file>

Example

./twilight -a ../dataset/RNASim_backbone.aln -i ../dataset/RNASim_sub.fa -t ../dataset/RNASim.nwk -o RNASim.aln

Merge Multiple MSA Files

To merge multiple MSAs, place all MSA files into a single folder.

Usage syntax

./twilight -f <path to the folder> -o <output file>

Example

./twilight -f ../dataset/RNASim_subalignments/ -o RNASim.aln

Flexible Tree Support

Prunes tips that are not present in the raw sequence file. This is useful when working with a large tree but only aligning a subset of sequences, without needing to re-estimate the guide tree. Outputting the pruned tree is also supported.

Usage syntax

./twilight -t <large tree file> -i <subset of raw sequences> -o <output file> --prune [--write-prune]

Example

./twilight -t ../dataset/RNASim.nwk -i ../dataset/RNASim_sub.fa -o RNASim_sub.aln --prune --write-prune

Snakemake Workflow

For more information about TWILIGHT's options and instructions, see wiki or Help. To set up the environment and install external tools, see here.

Setup

  • For users who install TWILIGHT via Conda, please replace the executable path "../bin/twilight" with "twilight" in config.yaml. Feel free to switch to a more powerful tree tool if available, such as replacing "raxmlHPC" with "raxmlHPC-PTHREADS-AVX2" for better performance.
  • Note that since some tree-building tools can’t automatically detect the sequence type, specifying datatype is required. Use TYPE=n for nucleotide sequences or TYPE=p for protein sequences.

Enter workflow directory and type snakemake to view the help messages.

cd workflow
snakemake
# or, for Snakemake versions that require specifying total number of cores:
snakemake --cores 1

Iterative Mode

TWILIGHT iterative mode estimate guide trees using external tools.

Supported tree inference tools:

  • Initial guide tree: parttree, maffttree, mashtree
  • Intermediate iterations (optimized for speed): rapidnj, fasttree
  • Final tree (optimized for quality): fasttree, raxml, iqtree

Usage syntax

snakemake [--cores <num threads>] --config TYPE=VALUE SEQ=VALUE OUT=VALUE [OPTION=VALUE ...]

Example

  • Using default configurations
snakemake --cores 8 --config TYPE=n SEQ=../dataset/RNASim.fa OUT=RNASim.aln
  • Generates the final tree based on the completed MSA.
snakemake --cores 8 --config TYPE=n SEQ=../dataset/RNASim.fa OUT=RNASim.aln FINALTREE=fasttree

Add New Sequences to an Existing Alignment

TWILIGHT aligns new sequences to the profile of the backbone alignment, infers their placement with external tools, and then refines the alignment using the inferred tree.

Usage syntax

snakemake [--cores <num threads>] --config TYPE=VALUE SEQ=VALUE OUT=VALUE ALN=VALUE [OPTION=VALUE ...]

Example

  • The backbone alignment is accompanied by a tree.
snakemake --cores 8 --config TYPE=n SEQ=../dataset/RNASim_sub.fa OUT=RNASim.aln ALN=../dataset/RNASim_backbone.aln TREE=../dataset/RNASim_backbone.nwk
  • The backbone tree is unavailable, estimate it using external tools and generate a final tree after alignment.
snakemake --cores 8 --config TYPE=n SEQ=../dataset/RNASim_sub.fa OUT=RNASim.aln ALN=../dataset/RNASim_backbone.aln FINALTREE=fasttree

Contributions

We welcome contributions from the community to enhance the capabilities of TWILIGHT. If you encounter any issues or have suggestions for improvement, please open an issue on TWILIGHT GitHub page. For general inquiries and support, reach out to our team.

Citing TWILIGHT

If you use the TWILIGHT in your research or publications, we kindly request that you cite the following paper:

Yu-Hsiang Tseng, Sumit Walia, Yatish Turakhia, "Ultrafast and ultralarge multiple sequence alignments using TWILIGHT", Bioinformatics, Volume 41, Issue Supplement_1, July 2025, Pages i332–i341, doi: 10.1093/bioinformatics/btaf212

About

High throughput tool for tall and wide multiple sequence alignment.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •