To install this package run (you need to have Go):
pip install .To compile Go modules with C types to work with Python run:
export CGO_ENABLED=1
go build -o main.so -buildmode=c-shared main.goWe have a class MatrixTableConsumer, which performs operations on Hail matrix table:
-
MatrixTableConsumer().prepare_metadata_for_savingsaves matrix table metadata to json format -
MatrixTableConsumer().prepare_metadata_for_loadingloads table metadata -
MatrixTableConsumer().collectgivesnum_rowsrows from vcf file (it can also open vcf.gz) -
MatrixTableConsumer().collect_allcollects all table rows from vcf file (it can also open vcf.gz) -
MatrixTableConsumer().convert_rows_to_hailconverts rows to Matrix Table format -
MatrixTableConsumer().create_hail_tablecollects table from rows -
MatrixTableConsumer().combine_hail_matrix_table_and_tablecombines MatrixTable and Table -
MatrixTableConsumer().countreturns number of rows in the vcf file
You can look at the main.ipynb file, which contains examples of using MatrixTableConsumer
You can look at the benchmarks.md file, which contains benchmark of my program and bcftools
You can filter .vcf and .vcf.gz files (&& and || operators is available):
vcf_tools -filter \
-o ./data/test_1.vcf \
-vcf ./data/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz \
-i "(QUAL>=90 && AF>=0.00001) || AF>=0.001" \
-num_cpu 7You can merge .vcf files:
vcf_tools -merge \
-vcf ./data/test1.vcf \
-vcf2 ./data/test2.vcf \
-o ./data/test_merged.vcfIt is possible to merge multiple vcf files:
vcf_tools -merge \
--file_with_vcfs ./data/vcfs.txt \
-o ./data/test_merged.vcfWhere vcfs.txt is:
./data/merge/test1.vcf
./data/merge/test2.vcfYou can view vcf files from terminal:
arrow down-> next linearrow up-> previous linearrow right-> rightarrow left-> leftENTER-> next pageSPACE-> enter line numberESC-> quit
vcf_tools -view -vcf ./data/merge/test_merged.vcfvcf_tools -sort \
-vcf ./data/sort/test.vcf \
-o ./data/sort/test_sorted.vcfvcf_tools -index \
-vcf ./data/test.vcf.gzYou can convert .vcf file to zarr (.vcz) format:
vcf_tools -save_vcf_as_zarr \
-vcf ./data/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz \
-o ./data/test.vcz \
-show_progress \
-num_cpu 7-
MatrixTableConsumer().save_vcf_as_zarrconvert.vcffile to zarr (.vcz) format -
MatrixTableConsumer().load_zarr_dataloads zarr data -
MatrixTableConsumer().sample_qc_analysissample quality analysis -
MatrixTableConsumer().run_gwasrun GWAS
To run tests, use:
pytest tests