Hello,
I was wondering if there is anybody to help me filter my vcf file (from freebayes) in order to check for heterogeneity in my single genome sequencing data.
I assume I should use:
vcftools --vcf my.vcf --maf 0.4 --recode --recode-INFO-all --out my_filtered.vcf
but I am not sure how I can add an option for low coverage reads.
Thanks!
Can you provide more detail on your starting VCF, and exactly how you would like to filter it?
Depth of coverage is (generally) a sample-level statistic. There isn't one coverage value for the whole variant row; there's a value for every genotype call in the row. Assuming you have a multi-sample VCF, what do you want to do with low coverage genotypes? Remove the whole row if any genotypes have low coverage? Change the genotype to missing if the coverage is low? Etc.
On the other hand, you may have a DP statistic in the INFO field that represents something like the mean coverage across all samples; is this what you're referring to?
So, I've got a vcf file of a single genome (HIV provirus) and what I would like to do is to confirm that there is only one genome indeed, by checking if there are any accurate variants. In this context, I thought that if I get rid of any rows of low coverage variants and minor alleles that appear less than 40%, I can get an idea of how "clean" my sample is. My VCF looks like this: