I am looking to re-call a large number of legacy exomes which have been previously called (using different pipelines). To do this, I am following the Broad's best practices to revert a BAM to a uBAM: http://gatkforums.broadinstitute.org/firecloud/discussion/6484/
That process is making sense, but one question that came up while I was mucking about in the reads was that SM tag, which I understand to mean "Sample" was different on different reads of the same sample. For example, here are two reads from one sample (nuclotides redacted);
Read1:
H7FL7BBXX160404:2:2115:11809:26494 99 1 10004 18 3S60M1I9M3S = 10359 429 "sequence here" @>?>?@B@@B@@CAAB@@CAAB@@CAAB@@CAAB@@CAAB@@AAAB@@CAA?@@@AAB@@:AAB@@@;A>A>==@= MC:Z:2S74M MD:Z:69 PG:Z:MarkDuplicates.A RG:Z:H7FL7.2AM:i:18 NM:i:1 SM:i:18 MQ:i:18 OQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJFJJJJJ7JJJJJJ<JFJJJJJF UQ:i:0
Read2:
H7FL7BBXX160404:2:1124:24282:45361 163 1 10005 15 27M1I48M = 10247 318 "sequence here" <:>>?@>>A@@=>>A@@@>>A@@@>>>@AA???B@@A??>AAA@@CAAB@?CAAB@@@AAB@@C1AA??9?>><=# MC:Z:76M MD:Z:3A5A5A5A6A10A35 PG:Z:MarkDuplicates.ARG:Z:H7FL7.2 AM:i:15 NM:i:7 SM:i:15 MQ:i:15 OQ:Z:AAFFFJJJJJJFJJJJJJJJJJJJJJFJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJFJJJJJJ<JJJJ7JJJJJ# UQ:i:242
In the reads above, I do not understand why the first read has SM:i:18 while the second is SM:i:15 - aren't these from the same sample? Shouldn't they be the same? What am I missing here?
Thanks!
That's not true: SM stands for "Template independant Mapping quality". https://github.com/samtools/hts-specs/blob/master/SAMtags.pdf
The friendlier URL is http://samtools.github.io/hts-specs/SAMtags.pdf — better to point people to that than to an URL containing details like branch names that may change in future.