vcf files: counting number of variants in genomic windows of chosen size
2
0
Entering edit mode
7.0 years ago
spiral01 ▴ 110

Is there a tool to count the number of variants in each genomic window of user-designated size? Something that would work along the lines of vcftools --TajimaD which takes as argument the size of the window you would like and then calculates Tajima's D in each window. I would like to simply count the number of variants in each window.

SNP • 4.8k views
ADD COMMENT
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Apologies for the repeat question. That is exactly what i was after. Thank you.

ADD REPLY
0
Entering edit mode
7.0 years ago

Via BEDOPS:

$ bedmap --echo --count genes.bed <(vcf2bed < variants.vcf) > answer.bed

If your genes are in another format, say GFF:

$ bedmap --echo --count <(gff2bed < genes.gff) <(vcf2bed < variants.vcf) > answer.bed

If you have generic windows, replace genes.bed with a windows.bed of your design.

ADD COMMENT
0
Entering edit mode

Hi, I am getting segmentation fault: 11 when using the first bedmap command as such:

bedmap --echo --count windows.bed <(vcf2bed < chr21.vcf.gz) > chr21.coverage.txt

The final output keeps giving me a count of 0 for each window. I'm not sure how to interpret this?

ADD REPLY
0
Entering edit mode

The file chr21.vcf.gz is not a VCF file, but is instead a gzip-compressed binary. Extract it and then pipe the extracted data to vcf2bed, e.g.:

$ bedmap --echo --count --delim '\t' windows.bed <(gunzip -c chr21.vcf.gz | vcf2bed -) > windows_with_counts_of_variants.bed

Interpretation: If some of your windows are not on chr21, and all the variants in chr21.vcf are from chr21, then expect zero-counts over those windows which are not on that chromosome.

ADD REPLY

Login before adding your answer.

Traffic: 2605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6