Quik Barcode Calling

DNA barcodes, which are short DNA strings, are regularly used as tags in pooled sequencing experiments to enable the identification of reads originating from the same sample. A crucial task in the subsequent analysis of pooled sequences is barcode calling, where one must identify the corresponding barcode for each read. This task is computationally challenging when the probability of synthesis and sequencing errors is high, like in photolithographic microarray synthesis. Here, we propose a simple, yet highly effective new barcode calling approach that uses a filtering technique based on precomputed k-mer lists. This approach has a slightly higher accuracy than the state-of-the-art approach, is more than 500 times faster than that, and thus allows barcode calling for one million barcodes and one billion reads per day on a server GPU.

Repository Structure

📦 Quik
├── 📁 data/ # Barcode read file required to reproduce our experiments data 
├── 📁 src/ # Source code for analyses and experiments 
├── 📁 results/ # Output data, plots, or tables 
├── 📄 README.md # This file 
├── 📄 CMakeLists.txt # CMake file for installation
└── 📄 LICENSE # License file

Installation

The software has been developed for Linux and has been tested on an Ubuntu 24.04 system. The following steps are required for installation:

Install software packages:

sudo apt install git cmake g++ libomp-dev nvidia-cuda-toolkit

Checkout the project.

git clone https://github.com/uni-halle/quick.git

Compile the source files. Quik comes with a CMake Script that should work for various operating systems. CMake will automatically detect whether all mandatory and optional libraries are available at your system.
```
cd quik
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
```

The build directory should now contain several binaries including benchmark_barcode_calling, used to compare the accurary and efficiency of various of our barcode calling approaches.

Benchmark Experiment

After building the code, the sample binary benchmark_barcode_calling can be used to reproduce some of the results from our article.

The benchmark program tries to assign each read in the <read_file> to some barcode in the <barcode_file>. To assess the accuracy of the barcode assignment, a <label_file> is required in which the original barcode is specified for each read.

The benchmark program will run several of our k-mer distance barcode calling approaches and presents the associated precision and recall as well as the running time in milliseconds per read. Usage:

benchmark_barcode_calling <barcode_file> <read_file> <label_file> <distance_measure> <rejection_threshold>

Argument	Description
`<barcode_file>`	Text file with one barcode per line: `barcode[0]` ... `barcode[n-1]`.
`<read_file>`	Text file with one read per line: `read[0]`... `read[m-1]`.
`<label_file>`	Text file with m lines of integers: `label[0]` ... `label[m-1]`. The integer `label[i]` is associated to `read[i]` and describes the index of the barcode, from which this read originated. Thus, `read[i]` originated from `barcode[label[i]]`.
`<distance_measure>`	Distance measure between reads and barcodes. Must be one of the following: `LEVENSHTEIN`, `SEQUENCE_LEVENSHTEIN`. Has an effect on the accurary of the barcode calling process.
`<rejection_threshold>`	If a read's distance to the closest barcode is larger than this integer, the read is rejected and remains unassigned. Has a large effect on the accuracy of the barcode calling process.

The data directory contains sample files which we used to produce the results in our journal article.

License

This project is licensed under the GPL3. See the LICENSE file for details.

Citation

If you use this repository or the associated article, please cite it as follows:

@article{FastBarcodeCallingBasedOnKMerDistances2025,
  author  = {Riko Corvin Uphoff, Steffen Schüler, Ivo Grosse, Matthias Müller-Hannemann},
  title   = {Fast barcode calling based on k-mer distances},
  journal = {to be announced},
  year    = {2025},
  volume  = {X},
  pages   = {Y-Z},
  doi     = {DOI link}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README (Java).md		README (Java).md
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quik Barcode Calling

Table of Contents

Repository Structure

Installation

Benchmark Experiment

License

Citation

About

Releases

Packages

Contributors 2

Languages

License

uni-halle/quik

Folders and files

Latest commit

History

Repository files navigation

Quik Barcode Calling

Table of Contents

Repository Structure

Installation

Benchmark Experiment

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages