Extract data from VCF file
2
0
Entering edit mode
7.7 years ago
inkprs ▴ 70

How can I extract below fields from a VCF file?

I am looking for python parser for VCF file.

'ALLELE_CALL', 'IS_HETEROZYGOUS', 'NUM_READS', 'TOTAL_READ_DEPTH'

My VCF file looks like:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT MATERIAL1 MATERIAL2 MATERIAL..n
sequencing • 2.6k views
ADD COMMENT
0
Entering edit mode

what are VARIANT_TYPE, SEQUENCE,ALLELE_CALL,VALUE, etc... ? How can we know what you want to put in those columns ?

ADD REPLY
0
Entering edit mode

Updated the question.

ADD REPLY
1
Entering edit mode
7.7 years ago
jzluo1 ▴ 10

They're probably in the INFO field. You can just use cut, or GATK VariantsToTable, or PyVCF. Lots of options!

ADD COMMENT
0
Entering edit mode
7.7 years ago

Try @brentp 's cyvcf2 (cython + htslib == fast VCF and BCF processing), a fast python (2 and 3) parsing of VCF and BCF including region-queries, published on Bioinformatics.

ADD COMMENT

Login before adding your answer.

Traffic: 2471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6