Mutect2 gets different results when I change the downsample level
1
0
Entering edit mode
7.5 years ago
lghust2011 ▴ 110

I use mutect2 of GATK 3.6 and GATK 3.7 to call variant. I know there is a downsampling in mutect2 which has an important influence on the result. So I change the downsampling level. For example: the default value is:

maxReadsInRegionPerSample = 1000;
minReadsPerAlignmentStart = 5;

I change these parameters to a bigger one:

 maxReadsInRegionPerSample = 2000;
 minReadsPerAlignmentStart = 10;

Then I compile the code, run it and get the result named downsample_2x.vcf. However, compared to the default result original.vcf, the result is very strange:

There are more variants in downsample_2x.vcf, which is easy to understand because there are much more samples. However, there are also less variants in downsample_2x.vcf(That is, variants in original.vcf are not show in downsample_2x.vcf, around 200 within total 900 variants). Since the sample get bigger, why there are less variants? It's difficult for me to understand. If the result with more samples is much more accurate, how about these missing 200 variants?Any reply will be much appreicated!

alignment next-gen sequencing gene • 2.4k views
ADD COMMENT
2
Entering edit mode
7.5 years ago
DG 7.3k

There are a few things I can think of. Not saying this is comprehensive and some could be wrong but these would be my speculations:

1) Increasing the number of samples improves your removal of highly recurrent false positive variant calls that are not caught by other filters when you are doing joint sample-aware calling. Basically more samples improves your estimation of the error, so lots of those edge cases can drop out

2) doubling the number of reads you consider, even though downsampling is random, could be enough to change some variants that were formerly just above your threshold in say allele frequency, to just underneath it.

I've found that the latter is typically the case that happens in my samples with amplicon sequencing. Given standard thresholds for filtering variants and the like, I tend to see fewer variants in all when I have a higher depth of coverage because those edge-case false positives drop out.

ADD COMMENT

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6