SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding

SpecMER is a speculative decoding framework for accelerated protein sequence generation using k-mer distributions from multiple sequence alignments (MSAs).

Key Components

Core Scripts

`kmer.py`

K-mer analysis and distribution generation from MSA files.

Key Functions:

read_msa_file() - Parse MSA files (FASTA/A2M formats)
extract_kmers() - Extract k-mers from protein sequences
analyze_kmers() - Generate k-mer frequency statistics
create_softmax_distribution() - Convert frequencies to probability distributions
create_distributions() - Generate normalized and raw k-mer distributions

`protein.py`

Protein configuration management for different protein targets.

Features:

ProteinConfig dataclass for protein-specific parameters
get_protein_config() - Retrieve configurations for supported proteins:
- GFP (context_length=20, max_tokens=219)
- RBP1 (context_length=10, max_tokens=42)
- CBS (context_length=50, max_tokens=501)
- ADRB2 (context_length=40, max_tokens=373)
- F7YBW8, GB1, Q59976
Integration with MLP evaluation models

`run_kmer_specmer.py`

Main experimental runner for SpecMER protein generation.

Key Function:

run_kmer_specme_experiment() - Execute speculative decoding experiments

Parameters:

protein_name - Target protein (GFP, RBP1, CBS, etc.)
num_candidates - Number of draft candidates (1, 2, 3, 5)
n_draft_tokens - Draft sequence length (5, 10, 15)
temperature - Sampling temperature (0.7, 1.0, 1.4)
k_values - K-mer sizes for distributions (e.g., [1,3], [1,3,5])

`specmer_demo.py`

Quick demonstration script for SpecMER functionality.

Default Configuration:

Protein: GFP
K-values: [1,3]
Temperature: 1.0
Draft tokens: 5
Candidates: 3
Generates 3 example sequences

Usage

Quick Demo

python specmer_demo.py

Full Experiment

python run_kmer_specmer.py \
    --protein GFP \
    --num_candidates 5 \
    --n_draft_tokens 10 \
    --temperature 1.0 \
    --k_values 1,3,5 \
    --num_experiments 50

Generate K-mer Distributions

python kmer.py \
    --msa_file data/GFP_AEQVI_full_04-29-2022_b08.a2m \
    --output_dir kmers/kmers_GFP \
    --k_values 1,3,5

Directory Structure

/analysis/ - Analysis and evaluation scripts
/data/ - MSA files and protein sequences
/kmers/ - Pre-computed k-mer distributions
/plotting/ - Visualization scripts
/util/ - Utility functions

Requirements

PyTorch
BioPython
NumPy
Pandas
ProGen2 models
CUDA-compatible GPU (recommended)

Supported Proteins

Protein	Length	Context	MSA Depth
GFP	238 aa	20	396
RBP1	52 aa	10	135,922
CBS	551 aa	50	19,563
ADRB2	413 aa	40	204,722
F7YBW8	93 aa	10	38,613
GB1	55 aa	10	44
Q59976	501 aa	50	105,913

Output

Experiments generate CSV files containing:

Generated protein sequences
Acceptance/rejection statistics
Generation timing metrics
K-mer scoring information

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
analysis		analysis
data		data
kmers		kmers
plotting		plotting
util		util
.gitignore		.gitignore
README.md		README.md
SpecMER_icon.png		SpecMER_icon.png
distance_analysis_results.csv		distance_analysis_results.csv
kmer.py		kmer.py
models.py		models.py
protein.py		protein.py
run_hyperparameter_sweep.py		run_hyperparameter_sweep.py
run_kmer_specmer.py		run_kmer_specmer.py
specmer_demo.py		specmer_demo.py
speculate_kmer_kv.py		speculate_kmer_kv.py
wildtype_likelihoods.csv		wildtype_likelihoods.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding

Key Components

Core Scripts

`kmer.py`

`protein.py`

`run_kmer_specmer.py`

`specmer_demo.py`

Usage

Quick Demo

Full Experiment

Generate K-mer Distributions

Directory Structure

Requirements

Supported Proteins

Output

About

Uh oh!

Releases

Packages

Languages

amirgroup-codes/SpecMER

Folders and files

Latest commit

History

Repository files navigation

SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding

Key Components

Core Scripts

kmer.py

protein.py

run_kmer_specmer.py

specmer_demo.py

Usage

Quick Demo

Full Experiment

Generate K-mer Distributions

Directory Structure

Requirements

Supported Proteins

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`kmer.py`

`protein.py`

`run_kmer_specmer.py`

`specmer_demo.py`

Packages