This R package provides tools for generating consensus Topologically Associating Domains (TADs) from multiple prediction methods. TADs are fundamental units of chromatin organization that play crucial roles in gene regulation. While multiple computational tools exist to predict TAD boundaries from Hi-C data, their results often vary significantly. This package implements methods to integrate predictions from multiple tools and generate high-confidence consensus TAD sets.
# Install from GitHub
devtools::install_github("CSOgroup/consensusTADs", build_vignettes = TRUE)
- Generate consensus TADs from multiple prediction tools
- Calculate Measure of Concordance (MoC) between TAD predictions
- Select optimal non-overlapping TAD sets using dynamic programming
- Apply iterative threshold approach for consensus building
Creates consensus TADs through an iterative threshold approach that selects optimal non-overlapping TADs representing agreement across different prediction methods.
consensus_tads <- generate_tad_consensus(
df_tools, # Data frame with TAD predictions
threshold = 0, # Minimum MoC threshold
step = -0.05 # Step size for threshold iteration
)
Calculates the Measure of Concordance (MoC) between TAD predictions and filters significant overlaps based on a threshold.
Implements a dynamic programming algorithm to select a set of non-overlapping TADs that maximize the total MoC score.
# Prepare input data with predictions from multiple tools
tad_data <- data.frame(
chr = rep("chr1", 6),
start = c(10000, 20000, 50000, 12000, 22000, 48000),
end = c(30000, 45000, 65000, 32000, 43000, 67000),
meta.tool = c(rep("tool1", 3), rep("tool2", 3))
)
# Generate consensus TADs with default parameters
library(consensusTADs)
consensus_results <- generate_tad_consensus(tad_data)
print(consensus_results)
# Generate consensus TADs with custom threshold values
custom_consensus <- generate_tad_consensus(
tad_data,
threshold = 0.3,
step = -0.1
)
The consensus generation process follows these steps:
- Input validation: Check if the input contains data from multiple prediction tools
- Data preparation: Split the input data by chromosome
- Threshold sequence generation: Create a sequence of threshold values
- Iterative TAD selection: For each chromosome and threshold, calculate MoC scores and select optimal TADs
- Result compilation: Combine results from all chromosomes
The MoC score quantifies the agreement between two TAD predictions:
MoC = (intersection_width)² / (width1 × width2)
Where:
intersection_width
is the length of the overlap between two TADswidth1
andwidth2
are the lengths of the two TADs being compared
- dplyr
- GenomicRanges
- IRanges
- tibble
- purrr
- tidyr
- stringr
- magrittr