Skip to content

Multiomics-Analytics-Group/InstaNexus

Repository files navigation

InstaNexus logo

A de novo protein sequencing workflow

Conda License Python


Table of Contents


Introduction

InstaNexus is a generalizable, end-to-end workflow for direct protein sequencing, tailored to reconstruct full-length protein therapeutics such as antibodies and nanobodies. It integrates AI-driven de novo peptide sequencing with optimized assembly and scoring strategies to maximize accuracy, coverage, and functional relevance.

This pipeline enables robust reconstruction of critical protein regions, advancing applications in therapeutic discovery, immune profiling, and protein engineering.


Features

  • 🧬 Supports De Bruijn Graph and Greedy-based assembly
  • ⚗️ Handles multiple protease digestions (Trypsin, LysC, GluC, etc.)
  • 🧹 Integrated contaminant removal and confidence filtering
  • 🧩 Clustering, alignment, and consensus sequence reconstruction
  • 🔗 Integrates with external tools:
  • 📊 Output-ready for downstream analysis and visualization

Workflow Diagram

InstaNexus Workflow


Repository Structure

File / Folder Description
environment.linux.yml Conda environment definition with required dependencies for linux
environment.osx-arm64.yaml Conda environment definition with required dependencies for OS
README.md Project documentation
examples/
fasta/ Known contaminants and example FASTA sequences
images/ Logos and workflow diagrams (PNG, SVG, PDF)
inputs/ Example datasets (e.g., BSA, antibody, nanobody)
json/ JSON metadata for peptide color coding and analysis
notebooks/ Jupyter notebooks for visualization and exploration
src/ Core scripts to run the InstaNexus pipeline

Prerequisites and Installation

Important

MMseqs2 and Clustal Omega are available through Conda, but compatibility depends on your system architecture.


Getting Started

Follow these steps to clone the repository and set up the environment using Conda:

1. Clone the repository

To clone and set up the environment:

git clone https://github.com/your-username/instanexus.git
cd instanexus

2. Create the conda environment

Create instanexus conda environment for linux

conda env create -f environment.linux.yml

Create instanexus conda environment for OS

conda env create -f environment.osx-arm64.yaml

3. Activate the environment

conda activate instanexus

Hyperparameter Optimization

To launch the hyperparameter grid search, run the following command from the project root (the folder containing src/ and json/):

python -m src.opt.gridsearch

Adjusting Parameters

Grid search parameters for both the De Bruijn graph (dbg) and Greedy (greedy) assembly methods are defined in:

json/gridsearch_params.json

To test more (or fewer) combinations, edit the arrays for each parameter in this file.

License

This project is licensed under the MIT License.


Acknowledgments

InstaNexus was developed at DTU Biosustain and DTU Bioengineering.

We are grateful to the DTU Bioengineering Proteomics Core Facility for maintenance and operation of mass spectrometry instrumentation.

We also thank the Informatics Platform at DTU Biosustain for their support during the development and optimization of InstaNexus.

Special thanks to the users and developers of:


References

  1. Hauser, M., et al. MMseqs2: ultra fast and sensitive sequence searching. Nature Biotechnology 35, 1026–1028 (2016). https://doi.org/10.1038/nbt.3988
  2. Sievers, F., et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7, 539 (2011). https://doi.org/10.1038/msb.2011.75
  3. Eloff, K., Kalogeropoulos, K., Mabona, A., Morell, O., Catzel, R., Rivera-de-Torre, E., ... & Jenkins, T. P. (2025). InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments. Nature Machine Intelligence, 1-15.

Citation

If you find this project useful in your research or work, please cite it as:

Reverenna M., Nielsen M. W., Wolff D. S., Lytra E., Colaianni P. D., Ljungars A., Laustsen A. H., Schoof E. M., Van Goey J., Jenkins T. P., Lukassen M. V., Santos A., Kalogeropoulos K. (2025). Generalizable direct protein sequencing with InstaNexus [Preprint]. bioRxiv. https://doi.org/10.1101/2025.07.25.666861

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •