Hi guys, need some help filtering my vcf files with the VariantAnnotation package (v1.12.9) from Bioconductor. I have a vcf file (v4.2) containing 28238 SNPs for 160 individuals, and I'm running R version 3.1.3
I opened my file and ran:
hist(geno(vcf)$DP) ##check the distribution of Read Depth
hist(geno(vcf)$GQ) ##check the distribution of Phred-Scaled Genotype Quality
I then tried to filter out all samples with a DP<10 and a GQ<20 and create a new vcf file:
vcf2 <- vcf[geno(vcf)$DP>10 && geno(vcf)$GQ>20]
The problem is that this code does not seem to be working, because both files have exactly the same dimensions:
dim(vcf) ##28238 160
dim(vcf2) ##28238 160
So my questions are:
- How can I filter vcf files by DP and GQ?
- Is it better to eliminate SNPs that have low DP & GQ, or to re-code genotypes with low DP and GQ as missing genotypes (
./.
)?
Any help would be much appreciated, thanks!
Rodolfo
To fix your code, use
&
(vector and) rather than&&
(scalar and) as in response to your other version of this question.