Remove variants from vcf file does not work
0
0
Entering edit mode
4 months ago
8armed ▴ 20

I have a vcf file from which I want to remove SNPs that fall within certain genomic windows (20kb window size). I continue to face a problem with this seemingly easy task. I have checked my code for mistakes several times, but I still don't know what is going on.

Here's what I do:

1. From my original vcf file, I extracted all SNP positions like this:

bcftools query -f '%CHROM\t%POS\n' vcfFile.vcf > vcfFile.SNP.Positions

The first 10 SNPs on chrI (chromosome I) are:

chrI    810 
chrI    1619 
chrI    1686 
chrI    2393 
chrI    2674 
chrI    2696 
chrI    4642 
chrI    4655 
chrI    4813 
chrI    5117

2. I then wanted to remove SNPs that fall within certain genomic windows (bins of 20kb). For this, I created a bed-file ("remove.bed").

This file looks like this (here I show the first ten 20kb-windows):

chrom chromStart chromEnd
chrI 10000011020000
chrI 1140001 1160000
chrI 1160001 1180000
chrI 3440001 3460000
chrI 6460001 6480000
chrI 6860001 6880000
chrI 6880001 6900000
chrI 6900001 6920000
chrI 6920001 6940000
chrI 6940001 6960000

Notably, this bed-file is ordered by chromosome and position in increasing order.

3. I then use vcftools to remove all SNPs from vcfFile.vcf like this:

vcftools \
--vcf vcfFile.vcf \
--exclude-bed remove.bed \
--recode \
--recode-INFO-all \
--out vcfFile_removed

This creates a new vcf file called: "vcfFile_removed.recode.vcf". As expected, this vcf file now has less SNPs than the initial vcf file.

4. However, I then looked at all SNP positions in vcfFile_removed.recode.vcf, by again using bcftools:

bcftools query -f '%CHROM\t%POS\n' vcfFile_removed.recode.vcf > vcfFile_removed.SNP.Positions

The first 10 SNPs on chrI (chromosome I) in this file are:

chrI 25980088
chrI 25980144
chrI 25980172
chrI 25980365
chrI 25980393
chrI 25980422
chrI 25980539
chrI 25980647
chrI 25980652
chrI 25980673

To make sure this has nothing to do with how the vcf file is sorted, I also grepped for "chrI 810" and "chrI 1619" in "vcfFile_removed.SNP.Positions", but these SNP positions were no longer found.

I don't understand why these SNPs were removed because they do not fall in any of the specified bins within the bed file.

bcftools vcf vcftools • 279 views
ADD COMMENT
0
Entering edit mode

Why is the first line of your BED malformed? Also, have you tried sorting your final VCF?

ADD REPLY

Login before adding your answer.

Traffic: 1292 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6