Samtools: Local Realignment Around Indel
1
7
Entering edit mode
13.4 years ago
Nanada ▴ 70

In samtools mpileup, adjust the option -m and -F to control when want to initiate indel realignment, any guideline to set the values for m and F for

  1. high coverage data (eg. >=20X)
  2. low coverage data (eg. <20X)

documentation at http://samtools.sourceforge.net/mpileup.shtml stated For 500 exomes, using -m 3 -F 0.0002 (3 supporting reads at minimum 0.02% frequency) is necessary to find singletons.

any hints how are this thresholds been set?

indel samtools mpileup • 8.4k views
ADD COMMENT
3
Entering edit mode

Have you considered using gatk indel realign? It's another step, but there is more documentation.

ADD REPLY
4
Entering edit mode
13.0 years ago
Chris Penkett ▴ 490

It's worth to mention that the defaults are -m 1 -F 0.002.

So -m 1 is already at the minimum value and 0.2% frequency is 1 read in 500. Thus samtools should set most indels as potential candidates up to a depth of 500 reads even if only one read actually has an indel. This is pretty relaxed if you're calling on only 1 exome or maybe up to ~5-10 exomes - and certainly not an issue for the 20x threshold you're interested in.

If you're calling indels on 500 exomes, you would easily have a depth of more than 1000 (and probably into the 10-100's of thousands) in most positions. So if only one sample has that indel - the default 0.2% would miss 10 reads from that sample if there are more than 5000 other reads at that position without it. Hence why it is suggested to drop it 10x lower if you are searching for a single indel in many exome sequences. Increasing -m to 3 is maybe to speed up samtools in these situations as there could be so many potential indels in 500 exomes, that to set all of them as candidates would make it very slow.

Quick calculation:

If you have an average depth of 20 per sample - your total depth in 500 exomes will be 20*500 = 10000. And imagine that the sample with an indel only has a depth of 10 reads - this is 0.1% (below the default) of the total depth, hence why you could miss a single indel in 500 samples without putting -F down to something lower like 0.02% as suggested in the help page.

Maybe if you're calling on 50 exomes you may want to try something intermediate like -m 2 -F 0.0005.

Chris

ADD COMMENT

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6