-
Notifications
You must be signed in to change notification settings - Fork 4
Commands to manipulate VCF files
Christine Tranchant-Dubreuil edited this page Jun 2, 2017
·
5 revisions
| Authors | Christine Tranchant-Dubreuil |
|---|---|
| Research Unit | UMR DIADE |
| Institut | ![]() |
This page describes some tools to manipulate and to extract easily informations from vcf file. We need, in this tutorial:
- a vcf file
- a reference file used for the mapping step.
- Extracting list of samples from a vcf file with
grepandcutcommands - Extracting a subset of samples from a multigenome vcf file with
GATK selectVariants - Extracting a subset of samples from a multigenome vcf file with
bcftools - Calculating the nucleotide diversity from a vcf file with
vcftools
one line with all samples
grep "#CHROM" output | cut -f 10- one line by sample
grep "#CHROM" output | cut -f 10- | xargs -n 1java -Xmx12g -jar /usr/local/gatk-3.6/GenomeAnalysisTK.jar -T SelectVariants -R reference.fa -V inputFileName.vcf -o outputFilename.vcf -sn sample1 -sn sample2Rk : if you get the following error message "Fasta dict file ... for reference ... does not exist", please see https://www.broadinstitute.org/gatk/guide/article?id=1601 for help creating in.
java -Xmx12g -jar /usr/local/gatk-3.6/GenomeAnalysisTK.jar -T SelectVariants -R reference.fa -V inputFileName.vcf -o outputFileName.vcf --sample_file barthii.only.RG.list --ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLESjava -Xmx12g -jar /usr/local/gatk-3.6/GenomeAnalysisTK.jar -T SelectVariants -R reference.fa -V inputFileName.vcf -o outputFileName.vcf --exclude_sample_file barthii.only.RG.list --ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLESRk : if you get the following error message : "Bad input: Samples entered on command line (through -sf or -sn)) that are not present in the VCF", run with --ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES
bcftools view -S barthii.only.RG.list inputFileName.vcf --force-samples -o outputFilename.vcf`vcftools --vcf inputFilename.vcf --out outputFilename.PI --window-pi 100000 --remove-filtered-allgrep "PI" OgOb-all-MSU7-CHR2.GATKSV.VCFTOOLS.stats-100000.windowed.pi -v | awk '{ sum+=$5; print $5,"; ",sum , "* ", NR ; } END { print "PI average :", sum / NR; }'The resource material is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/



