Serpent is an exploration into DNA and RNA sequences, nucleotide bases, codons, amino acids and genome data.
My motivation to start this project was that I have wanted to explore DNA data in order to to learn and maybe invent some compression algorithms for DNA data for about two decades.
Install serpent with pip install serpent, or develop with pdm.
serpent cat: concatenate and print FASTA filesserpent find: find FASTA files in directoriesserpent find -s: find and print FASTA sequences in files and directories
serpent encode: Convert data into different encoded representationsserpent decode: Map codons into numbers 0...64
serpent ac: print and plot autocorrelation on DNA and RNA sequencesserpent fft: plot FFTs on DNA and RNA sequencesserpent hist: plot histogram statisticsserpent image: visualise DNA and RNA data as imagesserpent seq: plot sequence count statistics
serpent codons: Print codon statisticsserpent pep: Print peptide statistics
See serpent -h for all subcommands and serpent <subcommand> -h for options!
Get some sample data from NCBI datasets – I recommend starting with virus, bacteria or archea genomic data as they are smaller than plants or animals.
- National Center for Biotechnology Information
- Datasets - NCBI - NLM
- RefSeq: NCBI Reference Sequence Database
- Home - Nucleotide - NCBI
- Home - Protein - NCBI
- Genome - NCBI - NLM
A SARS-CoV-2 genome is only 29 kb for example!