Counting the number of SNPs (VCF) for each genomic coordinates (BED)
1
0
Entering edit mode
3.1 years ago
r.tor ▴ 50

I want to count the number of recorded variations for each genomic coordinates of a .bed file from the corresponding .vcf file. I guess it should be solved by vcftools, but I could not find any suitable option to use.

Let's suppose the bed file like this:

> chr10.bed
chr10   200383  202383  ENSG00000212331
chr10   224433  226434  ENSG00000015171
chr10   224453  226453  ENSG00000259741
chr10   254333  256391  ENSG00000015171
chr10   284402  286402  ENSG00000015171
chr10   288911  290911  ENSG00000015171

and the vcf file (the Data part just to show) as:

>chr10.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT GTEX-14PKV
10      200901   10_66040_CAG_C_b37      CAG     C       .       PASS    .       GT      0|0
10      225434   10_66326_A_G_b37        A       G       .       PASS    .       GT      0|0
10      225151   10_68431_T_TA_b37       T       TA      .       PASS    .       GT      0|0
10      254737   10_71776_T_C_b37        T       C       .       PASS    .       GT      1|1
10      254367   10_72040_CAAT_C_b37     CAAT    C       .       PASS    .       GT      0|0
10      286302   10_75794_G_T_b37        G       T       .       PASS    .       GT      1|1
10      290011   10_79191_G_A_b37        G       A       .       PASS    .       GT      0|0

Then, the output can be something like this:

> output_chr10.bed
ENSG00000212331  1
ENSG00000015171  0
ENSG00000259741  2
ENSG00000015171  3
ENSG00000015171  1
ENSG00000015171  1

Thank you in advance.

vcf vcftools bed bedtools • 1.3k views
ADD COMMENT
1
Entering edit mode
3.1 years ago
awk -F '\t' '{printf("%s:%d-%s %s\n",$1,int($2)+1,$3,$4);}' input.bed | while read R G ; do echo -n "${G} " && bcftools query --regions "${R}" -f 'A' input.vcf.gz | wc -c ; done
ADD COMMENT
0
Entering edit mode

Thanks. using the code, there is an error regarding the vcf file format. The error says: Could not retrieve index file for chr10.vcf.gz and also says: Failed to read from chr10.vcf.gz could not load index.

ADD REPLY
1
Entering edit mode

Tabix it.

ADD REPLY

Login before adding your answer.

Traffic: 2140 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6