Skip to content

We have a class MatrixTableConsumer, which performs operations on Hail matrix table and VCFTools class, which has vcf tools functionality

License

Notifications You must be signed in to change notification settings

PHILIPP111007/matrix_table_consumer

Repository files navigation

MatrixTableConsumer v1.2.10

To install this package run (you need to have Go):

pip install .

To compile Go modules with C types to work with Python run:

export CGO_ENABLED=1

go build -o main.so -buildmode=c-shared main.go

MatrixTableConsumer

We have a class MatrixTableConsumer, which performs operations on Hail matrix table:

  • MatrixTableConsumer().prepare_metadata_for_saving saves matrix table metadata to json format

  • MatrixTableConsumer().prepare_metadata_for_loading loads table metadata

  • MatrixTableConsumer().collect gives num_rows rows from vcf file (it can also open vcf.gz)

  • MatrixTableConsumer().collect_all collects all table rows from vcf file (it can also open vcf.gz)

  • MatrixTableConsumer().convert_rows_to_hail converts rows to Matrix Table format

  • MatrixTableConsumer().create_hail_table collects table from rows

  • MatrixTableConsumer().combine_hail_matrix_table_and_table combines MatrixTable and Table

  • MatrixTableConsumer().count returns number of rows in the vcf file

You can look at the main.ipynb file, which contains examples of using MatrixTableConsumer

Filter

You can look at the benchmarks.md file, which contains benchmark of my program and bcftools

You can filter .vcf and .vcf.gz files (&& and || operators is available):

vcf_tools -filter \
    -o ./data/test_1.vcf \
    -vcf ./data/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz \
    -i "(QUAL>=90 && AF>=0.00001) || AF>=0.001" \
    -num_cpu 7

Merge

You can merge .vcf files:

vcf_tools -merge \
    -vcf ./data/test1.vcf \
    -vcf2 ./data/test2.vcf \
    -o ./data/test_merged.vcf

It is possible to merge multiple vcf files:

vcf_tools -merge \
    --file_with_vcfs ./data/vcfs.txt \
    -o ./data/test_merged.vcf

Where vcfs.txt is:

./data/merge/test1.vcf
./data/merge/test2.vcf

View

You can view vcf files from terminal:

  • arrow down -> next line
  • arrow up -> previous line
  • arrow right -> right
  • arrow left -> left
  • ENTER -> next page
  • SPACE -> enter line number
  • ESC -> quit
vcf_tools -view -vcf ./data/merge/test_merged.vcf

Sort

vcf_tools -sort \
    -vcf ./data/sort/test.vcf \
    -o ./data/sort/test_sorted.vcf

Index

vcf_tools -index \
    -vcf ./data/test.vcf.gz

Zarr format

You can convert .vcf file to zarr (.vcz) format:

vcf_tools -save_vcf_as_zarr \
    -vcf ./data/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz \
    -o ./data/test.vcz \
    -show_progress \
    -num_cpu 7
  • MatrixTableConsumer().save_vcf_as_zarr convert .vcf file to zarr (.vcz) format

  • MatrixTableConsumer().load_zarr_data loads zarr data

  • MatrixTableConsumer().sample_qc_analysis sample quality analysis

  • MatrixTableConsumer().run_gwas run GWAS

Tests

To run tests, use:

pytest tests

About

We have a class MatrixTableConsumer, which performs operations on Hail matrix table and VCFTools class, which has vcf tools functionality

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published