Dear all,
I really try to find some clear answer on Google, but it seems to me, that VarScan parameters understand only their creators.
I would prefer if somebody explain me (on example) what is doing parameter min-Reads2 in VarScan Germline caller.
I tried to change on set of Samples this value to:
min-reads2=1 ----> number of SNPs = 58
min-reads2=2 ----> number of SNPs = 58
min-reads2=3 ----> number of SNPs = 58
min-reads2=4 ----> number of SNPs = 58
min-reads2=20 ---->number of SNPs = 31
So only different was when I set up min-Reads2 to high value. When I compare all vcf - I can see, that missing 27 variants are only where AC=1 appears (Allele count in genotypes). So probably filter min-Reads2 depend on AC value.
Does anybody understand this parameter. Please do not copy explanation from manual (Minimum supporting reads at a position to call variants [2]) I need explanation on example.
Thank you very much.
John.
hi John, There are, I think, default params. affecting the final outcome. The default somatic p-val. is at 0.05. So this might be trumping low read-depth support calls even when
min-reads2
is altered. And as you noticed there can be other params like AC value affecting too. You can try lowering the threshold of p-value (--somatic-p-value
) and then altermin-reads2
.Thank you Amitm.. I was playing with all the parameters. But do you understand what is doing parameter --min-reads2? And I am working with VarScan for Germline mutation. Thank you.
as far as I understand, thats the min. read support required for the variant allele. (repeating what is written on the website). So, in Germline mode (probably you are using mpileup2snp), thats the min. # of reads that should support the variant allele. An e.g. (a VCF line from single sample var. calling using mpileup2snp)
For this variant, the ref. supporting reads were 149 (5th value:
RD
of last col.) and var. allele supporting reads (6th value:AD
of last col.) were 84. The --min-reads2 controls how low the AD value can go.Great... I am looking at my data it make a sense right now. So does not mean that the parameter --min-var-freq is almost the same? When I look at your data - AD = 84, ADP = 233 - so your frequency is 36,05% (computationally is ok). So basically I only need to set up --min-var-freq or --min-reads2 isn't it? Algorithm probably first passed --min-reads2 and then check condition --min-var-freq.
hi, They are two ways of filtering or restricting the calls. You could use either or both. I do it like this in case of amplicon-seq. data - 1) First call (using
mpileup2snp
) all variants at moderate read-depth criteria (let say --min-reads2 at 10) and the desired --min-var-freq (maybe at 1%) 2) Then usefilter
module to apply a stronger read-depth criteria; lets say this time --min-reads2 at 30.This returns low freq. calls but with added support of good read-depth. The advantage of making this two-tiered is that you can look for low confidence calls in the first VCF, if needed.
Thank you so much for clarification. I tried to use VarScan filter and if I use just --min-var-freq and --min-reads2 it is work but still filtering also by the strands (Filed strands). I can see in manual that strand filter is not option (only for somaticFilter). Do you have the same experience and why it is still some reads failed by strand filter. Can I turn it of? Thank you for sharing your experiences.
Good that you could solve some of your queries. As about strand filter, are you still talking about
mpileup2snp
? If yes then check the--strand-filter
option. By default its1
. You can turn it off by passing value0
instead.Generally its not a good idea to turn off the strand filter. Wether in somatic mode or single-sample/Germline calling mode, I keep it on. The filter comes into play when there is high imbalance in the # of variant supporting reads from the plus strand vs. the minus strand. Check in the VCFs the
ADF
&ADR
value in case of single-sample calling andDP4
value in case of Somatic-mode calling.