Skip to content

First-place winning solution at Hack jak Brno hackathon in 2024 for rapid detection of antibiotic resistance.

License

Notifications You must be signed in to change notification settings

YannickGibson/hackjakbrno

Repository files navigation

🧬 DNA SEQUENCE ANALYZER

🧠 Overview

The sequencer_handler.py module continuously monitors for new files generated by the sequencer. When a new file is detected, sequencer_handler.py loads it and calculates the vector representation of each DNA segment.

Simultaneously, solver.py retrieves sequences most similar to the currently analyzed gene. Among these similar sequences, the algorithm applies the Smith–Waterman algorithm to determine if any sequence contains the gene.

The output from solver.py indicates which genes have been identified in the specified bacteria (by barcode).

Control over the entire vector database is managed through the iris_database.py file, which acts as an interface to interact with and manipulate the database. Internally, it utilizes the InterSystems IRIS vector database solution, providing efficient storage, querying, and management of vector data.

DNA_diagram

💻 Installation

Install dependencies:

pip install -r requirements

Download and setup InterSystems IRIS database container: https://github.com/intersystems-community/hackathon-2024/tree/main?tab=readme-ov-file

Run scripts:

python sequencer_handler.py
python solver.py

⏱ Comparison

This approach enables a comparison between using only standard sequence alignment algorithms and our vector-based solution. By leveraging vector representations, we achieve faster, more efficient searches for similar sequences, especially within large datasets. Unlike traditional alignment algorithms, which can be computationally intensive and slower for large-scale comparisons, our vector-based solution allows for rapid identification of potential matches before applying a more precise alignment (like the Smith–Waterman algorithm) for verification.

Times will be added here.

📘 References

Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5. PMID: 7265238. [https://pubmed.ncbi.nlm.nih.gov/7265238/]

Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, & Han Liu. (2024). DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome. [https://arxiv.org/abs/2306.15006]

Pavan Holur, K. C. Enevoldsen, Shreyas Rajesh, Lajoyce Mboning, Thalia Georgiou, Louis-S. Bouchard, Matteo Pellegrini, and Vwani Roychowdhury. (2024). Embed-Search-Align: DNA Sequence Alignment using Transformer Models. [https://arxiv.org/abs/2309.11087]

About

First-place winning solution at Hack jak Brno hackathon in 2024 for rapid detection of antibiotic resistance.

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •