Loopsim was tested with the following environment:
- Python >=3.8
- Linux (Ubuntu 20.04 LTS)
pip install loopsim
pip install git+https://github.com/CutaneousBioinf/Loopsim
git clone https://github.com/CutaneousBioinf/Loopsim
cd Loopsim
# Option 1: pip
pip install .
# Option 2: poetry
poetry install
The Loopsim pipeline requires three types of files:
-
Type Name: Chromatin loop file
-
Purpose: Store chromatin loop (Hi-C) data
-
Format: Delimiter-separated values file. Each row represents one chromatin loop. BEDPE format.
Columns are as follows:
Start region chromosome First loci of start region Last loci of start region End region chromosome First loci of end region Last loci of end region -
Example: merged_5K_10K.loop
-
-
Type Name: Chromosome region file
-
Purpose: Defines the region of each chromosome
-
Format: Delimiter-separated values file. Each row represents one chromosome region. BED format.
Columns are as follows:
Chromosome Start position End position -
Example: chr_region_hg19
-
-
Type Name: Genomic regions of interest
-
Purpose: Store genomic regions of interest to be checked against chromatin loop file
-
Format: Delimiter-separated values file. Each row represents a genomic region of interest.
Columns are as follows:
Chromosome Start loci of region End loci of region -
Example: 95_BCS_psor_loci
-
Loopsim is broken down into a number of different commands:
Step | Command | Command Description |
---|---|---|
1 | validate |
Validates the inputted loop file. Issues warnings about possibly erroneous data and removes some types of erroneous data. |
2 | simulate |
Produces a distribution of simulated loop files. Note that this may be a very intensive task, depending on the number of simulations you require. I recommend that anything >30 simulations be done with multiple batches, possibly as a collection of SLURM jobs. |
3 | analyze or batch-analyze |
Use batch-analyze to produce summary tables with overlaps for the simulated distribution of loop files. Use analyze to do the same for single loop files, such as the original. |
4 | visualize |
Produces visualizations, outputs summary statistics, and performs a statistical test with the simulated distribution and the original loop file. |
![]() |
---|
Loopsim pipeline |
You can run loopsim --help
for a broad overview of each of the commands.
$ loopsim --help
Usage: loopsim [OPTIONS] COMMAND [ARGS]...
For a more thorough explanation of what every command does, please see the
documentation.
Options:
--delimiter TEXT delimiter for outputted files [default: tab]
--version Show the version and exit.
--help Show this message and exit.
Commands:
analyze Perform analysis on a single loop file
batch-analyze Perform analysis on a distribution of loop files
simulate Generate a distribution of simulations
validate Validate input file and output a validated version
visualize Get visualization and stats from distribution of ratios
You can also run loopsim <COMMAND> --help
for more detailed help messages on each of the commands.
For example, here is the help message for simulate
:
$ loopsim simulate --help
Usage: loopsim simulate [OPTIONS] LOOP_IN_FILE
CHROMOSOME_REGION_FILE
SIMULATION_DATA_DIRECTORY
Generate a distribution of simulations.
NOTE: any data in SIMULATION_DATA_DIRECTORY may be overwritten!!
Options:
--num-sims INTEGER number of simulations [default: 1]
--num-processes INTEGER number of threads to use
[default: round(multiprocessing.cpu_count() / 2)]
--help Show this message and exit.
Below, you can find a guided walkthrough of the Loopsim tool.
To follow along with the guided walkthrough, just download the repository and install Loopsim. The Loopsim repository includes all the necessary example files.
$ loopsim validate example_data/merged_5K_10K.loop loop_valid.loop example_data/chr_region_hg19
Input loop file: example_data/merged_5K_10K.loop
Output loop file: loop_valid.loop
Chromosome regions file: example_data/chr_region_hg19
Flagging loop ends that are >= 1.000000e+05
Delimiter for output: ' '
Validating loop data
Validation complete
Validated data outputted to file loop_valid.loop
Files after:
.
└── loop_out.loop
$ loopsim simulate --num-sims 2 loop_valid.loop example_data/chr_region_hg19 sims/
Input loop file: loop_valid.loop
Chromosome regions file: example_data/chr_region_hg19
Number of simulations: 2
Number of processes: 5
Outputting simulation files to directory: sims/
Delimiter for output: ' '
Simulation 0 simulation started
Simulation 1 simulation start
Simulation 0 simulation complete
Simulation 1 simulation complete
Simulation 0 data outputted to file: sims/sim_hi-c_0.loop
Simulation 1 data outputted to file: sims/sim_hi-c_1.loop
Files after:
.
└── sims
|── sim_hi-c_0.loop
└── sim_hi-c_1.loop
$ loopsim batch-analyze sims/ example_data/95_BCS_psor_loci ratios_out.txt --loop-out-directory loop_out_dir/
Input loop files directory: sims/
Intervals file: example_data/95_BCS_psor_loci
Ratio distribution file: ratios_out.txt
Delimiter for output: ' '
Output loop files directory: loop_out_dir/
Output directory does not exist.
Output directory created!
Finished outputting analyzed files to loop_out_dir/
Finished outputting ratio distribution to ratios_out.txt
Files after:
.
|── ratios_out.txt
└── loop_out_dir
|── summary_table_0.loop
└── summary_table_1.loop
$ loopsim analyze loop_valid.loop loop_analyzed.loop example_data/95_BCS_psor_loci
Input loop file: loop_valid.loop
Output loop file: loop_analyzed.loop
Intervals file: example_data/95_BCS_psor_loci
Delimiter for output: ' '
Outputted analyzed loop file to loop_analyzed.loop
Ratio of overlapping intervals out of the total number of loops was: 0.034299968818210166
Files after:
Note: We don't use
loop_analyzed.loop
in the pipeline again.
.
└── loop_analyzed.loop
$ loopsim visualize ratios_out.txt dist_plot.jpg --other 0.034299968818210166
Obtaining overlapping ratios from: ratios_out.txt.
Exported plot to dist_plot.jpg
Summary stats:
Distribution mean: 0.0178775595052489
Distribution std: 0.000808458018194828
Distribution min: 0.0173058933582787
Distribution median: 0.0178775595052489
Distribution max: 0.0184492256522191
Calculating p-value based on empirical distribution:
p-value: 0.0
Calculating p-value based on normal distribution:
p-value: 0.0
Plot
Note: The rather odd looking distribution plot and
$p = 0$ are artifacts of the simulation being$N = 2$ .
![]() |
---|
dist_plot.jpg |
If you use Loopsim in your work, please cite as follows:
Plain
Gideon Shaked, Haihan Zhang, Zhaolin Zhang, Jiayu Zhou, Johann E Gudjonsson, James T Elder, Matthew T Patrick, Lam C Tsoi,
Loopsim: enrichment analysis of chromosome conformation capture with fast empirical distribution simulation,
NAR Genomics and Bioinformatics, Volume 7, Issue 3, September 2025, lqaf098,
https://doi.org/10.1093/nargab/lqaf098
BiBTeX
@article{
author = {Shaked, Gideon and Zhang, Haihan and Zhang, Zhaolin and Zhou, Jiayu and Gudjonsson, Johann E and Elder, James T and Patrick, Matthew T and Tsoi, Lam C},
title = {Loopsim: enrichment analysis of chromosome conformation capture with fast empirical distribution simulation},
volume = {7},
issn = {2631-9268},
shorttitle = {Loopsim},
url = {https://doi.org/10.1093/nargab/lqaf098},
doi = {10.1093/nargab/lqaf098},
number = {3},
journal = {NAR Genomics and Bioinformatics},
month = sep,
year = {2025},
pages = {lqaf098},
abstract = {Gene regulation is intricately influenced by the three-dimensional organization of the genome. In particular, chromatin can exist in loop structures that enable long-range regulatory interactions. By utilizing chromosome conformation capture techniques such as Hi-C, valuable information regarding the organization of these loop structures in 3D space can be obtained. Although functional/feature enrichment is now a common downstream analysis for various genomic platforms to provide biological context, tools specifically designed for high-throughput assays that capture chromosome conformation remain relatively limited. Here, we present Loopsim, a command-line application that performs enrichment analysis on Hi-C loop profiles against user-defined regions (available on GitHub at https://github.com/CutaneousBioinf/Loopsim). Loopsim efficiently simulates a background distribution using a distinctive sampling approach that considers loop size, intervals, loop–loop distances, and structure; it then computes statistics based on the empirical null distribution.},
}