Entering edit mode
3.1 years ago
r.tor
▴
50
I want to count the number of recorded variations for each genomic coordinates of a .bed file from the corresponding .vcf file. I guess it should be solved by vcftools, but I could not find any suitable option to use.
Let's suppose the bed file like this:
> chr10.bed
chr10 200383 202383 ENSG00000212331
chr10 224433 226434 ENSG00000015171
chr10 224453 226453 ENSG00000259741
chr10 254333 256391 ENSG00000015171
chr10 284402 286402 ENSG00000015171
chr10 288911 290911 ENSG00000015171
and the vcf file (the Data part just to show) as:
>chr10.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GTEX-14PKV
10 200901 10_66040_CAG_C_b37 CAG C . PASS . GT 0|0
10 225434 10_66326_A_G_b37 A G . PASS . GT 0|0
10 225151 10_68431_T_TA_b37 T TA . PASS . GT 0|0
10 254737 10_71776_T_C_b37 T C . PASS . GT 1|1
10 254367 10_72040_CAAT_C_b37 CAAT C . PASS . GT 0|0
10 286302 10_75794_G_T_b37 G T . PASS . GT 1|1
10 290011 10_79191_G_A_b37 G A . PASS . GT 0|0
Then, the output can be something like this:
> output_chr10.bed
ENSG00000212331 1
ENSG00000015171 0
ENSG00000259741 2
ENSG00000015171 3
ENSG00000015171 1
ENSG00000015171 1
Thank you in advance.
Thanks. using the code, there is an error regarding the vcf file format. The error says: Could not retrieve index file for
chr10.vcf.gz
and also says: Failed to read fromchr10.vcf.gz
could not load index.Tabix it.