Skip to content

digenoma-lab/hic-scaffolding-nf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hi-C Scaffolding Nextflow Pipeline

Nextflow
A Nextflow pipeline for scaffolding genome assemblies using Hi-C reads with CHROMAP, YAHS, and Juicer Tools.


Table of Contents


Introduction

This pipeline scaffolds draft genome assemblies using Hi-C data in three steps:

  1. Alignment: Hi-C reads are mapped to contigs using CHROMAP.
  2. Scaffolding: Contigs are ordered/oriented using YAHS.
  3. Visualization: Prepares files for manual curation in Juicebox.

Quick Start

nextflow run digenoma-lab/hic-scaffolding-nf \
    --contigs contigs.fa \
    --r1Reads hic_R1.fastq.gz \
    --r2Reads hic_R2.fastq.gz \
    -profile conda  # Use conda for dependencies

Inputs

Parameter Format Description
--contigs FASTA Draft assembly contigs.
--r1Reads FASTQ(.gz) Hi-C paired-end reads (R1).
--r2Reads FASTQ(.gz) Hi-C paired-end reads (R2).
--large Boolean Use this options for genomes > 4Gb

Outputs

Directory Files Description
out/chromap/ aligned.bam Hi-C read alignments.
out/scaffolds/ yahs.out_scaffolds_final.fa Scaffolded assembly (FASTA).
out/scaffolds/ yahs.out_scaffolds_final.agp AGP file for scaffolding.
out/juicebox_input/ out_JBAT.hic Juicebox-compatible Hi-C map.
out/juicebox_input/ out_JBAT.assembly Assembly file for Juicebox.

Dependencies


Configuration

Running Locally or on Other Clusters

  1. Conda (recommended for CHROMAP/YAHS):
    -profile conda

Add custom configuration in nextflow.config:

params {
  juicer_tools_jar = "/path/to/juicer_tools.jar"
}
process {
    withName: 'PRINT_VERSIONS' {
        cpus = 1
        memory = 1.GB
    }
    
   withName: 'SAMTOOLS_FAIDX' {
        cpus = 1
        memory = 1.GB
    }

    withName: 'CHROMAP_INDEX' {
        cpus = 10
        memory = 100.GB
    }

    withName: 'CHROMAP_ALIGN' {
        cpus = 44
        memory = 100.GB
    }

    withName: 'YAHS_SCAFFOLD' {
        cpus = 10
        memory = 100.GB
    }


    withName: 'JUICER_PRE' {
        cpus = 10
        memory = 100.GB
    }

    withName: 'ASSEMBLY_STATS' {
        cpus = 1
        memory = 10.GB
    }

}

Examples

Basic Run

nextflow run hic-scaffolding-nf/main.nf \
    --contigs sl_female_ont_purge_r2.fasta \
    --r1Reads DDU_AAOSDF_4_1_HFYVJDSX7.UDI488_clean.fastq.gz \
    --r2Reads DDU_AAOSDF_4_2_HFYVJDSX7.UDI488_clean.fastq.gz \
    -profile uoh  # Example profile

Troubleshooting

  • Missing files: Ensure all input paths are correct.
  • Conda issues: Use -profile conda or install dependencies manually.
  • Juicer Tools: Specify the JAR path with --juicer_tools_jar.

References

About

Nextflow pipeline for scaffolding genome assemblies with Hi-C reads

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Nextflow 100.0%