Hi all,
I am trying to filter a VCF file by read depth (min 10) and mapping quality (min 30) using vcftools. I have found it quite difficult to find sample lines of anything like this, and being very new to bioinformatics and command lines in general, I'm fairly certain I'm doing something wrong. This is what I have tried so far:
vcftools --vcf myfile.vcf --minDP 10 --minQ 30 --out myfile_filtered
This generates
VCFtools - v0.1.12b
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf ED132_55.vcf
--minDP 10
--minQ 30
--out ED132_55_filtered
After filtering, kept 1 out of 1 Individuals
After filtering, kept 87041 out of a possible 103802 Sites
Run Time = 0.00 seconds
And an output file entitled "myfile_filtered.log", which contains nothing but the exact information printed above. There is no new vcf file, so I can only assume something is going wrong.
I would be very grateful if anyone could help me fix this code, or offer alternatives that might work better. I should note that as I am not using my own computer, I can't easily install new software packages and am restricted to bcftools and vcftools. Any python or perl scripts would be wonderful
Hi Colin, thank you for your answer; I am now able to generate VCF files. I'm still not convinced that the line is working correctly, however. Upon examining the files I can see that the there are some SNPs present in the unfiltered file that are missing from the filtered file even though they satisfy the criteria, while there are some SNPs in the filtered file that do not satisfy the criteria. The command has definitely removed some SNPs but I'm not sure one what basis it's done it.
Once again, many thanks for taking the time to respond.