A Nextflow pipeline for scaffolding genome assemblies using Hi-C reads with CHROMAP, YAHS, and Juicer Tools.
- Introduction
- Quick Start
- Inputs
- Outputs
- Dependencies
- Configuration
- Examples
- Troubleshooting
- References
This pipeline scaffolds draft genome assemblies using Hi-C data in three steps:
- Alignment: Hi-C reads are mapped to contigs using CHROMAP.
- Scaffolding: Contigs are ordered/oriented using YAHS.
- Visualization: Prepares files for manual curation in Juicebox.
nextflow run digenoma-lab/hic-scaffolding-nf \
--contigs contigs.fa \
--r1Reads hic_R1.fastq.gz \
--r2Reads hic_R2.fastq.gz \
-profile conda # Use conda for dependencies
Parameter | Format | Description |
---|---|---|
--contigs |
FASTA | Draft assembly contigs. |
--r1Reads |
FASTQ(.gz) | Hi-C paired-end reads (R1). |
--r2Reads |
FASTQ(.gz) | Hi-C paired-end reads (R2). |
--large |
Boolean | Use this options for genomes > 4Gb |
Directory | Files | Description |
---|---|---|
out/chromap/ |
aligned.bam |
Hi-C read alignments. |
out/scaffolds/ |
yahs.out_scaffolds_final.fa |
Scaffolded assembly (FASTA). |
out/scaffolds/ |
yahs.out_scaffolds_final.agp |
AGP file for scaffolding. |
out/juicebox_input/ |
out_JBAT.hic |
Juicebox-compatible Hi-C map. |
out/juicebox_input/ |
out_JBAT.assembly |
Assembly file for Juicebox. |
- Core:
- Nextflow (≥20.04.0)
- CHROMAP (alignment)
- YAHS (scaffolding)
- Juicer Tools (visualization)
- Conda (recommended for CHROMAP/YAHS):
-profile conda
Add custom configuration in nextflow.config
:
params {
juicer_tools_jar = "/path/to/juicer_tools.jar"
}
process {
withName: 'PRINT_VERSIONS' {
cpus = 1
memory = 1.GB
}
withName: 'SAMTOOLS_FAIDX' {
cpus = 1
memory = 1.GB
}
withName: 'CHROMAP_INDEX' {
cpus = 10
memory = 100.GB
}
withName: 'CHROMAP_ALIGN' {
cpus = 44
memory = 100.GB
}
withName: 'YAHS_SCAFFOLD' {
cpus = 10
memory = 100.GB
}
withName: 'JUICER_PRE' {
cpus = 10
memory = 100.GB
}
withName: 'ASSEMBLY_STATS' {
cpus = 1
memory = 10.GB
}
}
nextflow run hic-scaffolding-nf/main.nf \
--contigs sl_female_ont_purge_r2.fasta \
--r1Reads DDU_AAOSDF_4_1_HFYVJDSX7.UDI488_clean.fastq.gz \
--r2Reads DDU_AAOSDF_4_2_HFYVJDSX7.UDI488_clean.fastq.gz \
-profile uoh # Example profile
- Missing files: Ensure all input paths are correct.
- Conda issues: Use
-profile conda
or install dependencies manually. - Juicer Tools: Specify the JAR path with
--juicer_tools_jar
.