Entering edit mode
13.7 years ago
Ian
6.1k
I have been using using the samtools pipeline for variant calling for a while now and have just moved over to using mpileup. I am using the suggested method:
- samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
- bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
I am aware at the vcfutils step that '-d' specifies the minimum read depth and '-D' specifies the maximum read depth. However, what is not clear to me is, for example, if -D = 100 and the coverage of a SNP is 200, what happens? Do 100 random reads get used, or more worryingly is that SNP not reported?
Thanks.
Farhat gives the correct answer.
I have reopened this questions as it is crucial whether a SNP uses sampled reads or is skipped if coverage is above -D. Thanks for your effort Farhat, but 'appears' is not definite enough.
Thanks for the confirmation, Li. I had only taken a cursory look so didn't want to be too certain of the answer.
My understanding is that if unique or "reliable" (from FAQ of samtools ) sequences are used for snp calling, the -D option is meaningless, since the high coverage is not caused by duplication of genome segments.