When is the SM tag different for different reads of the same sample?
1
0
Entering edit mode
7.4 years ago

I am looking to re-call a large number of legacy exomes which have been previously called (using different pipelines). To do this, I am following the Broad's best practices to revert a BAM to a uBAM: http://gatkforums.broadinstitute.org/firecloud/discussion/6484/

That process is making sense, but one question that came up while I was mucking about in the reads was that SM tag, which I understand to mean "Sample" was different on different reads of the same sample. For example, here are two reads from one sample (nuclotides redacted);

Read1:

H7FL7BBXX160404:2:2115:11809:26494  99  1   10004   18  3S60M1I9M3S =   10359   429  "sequence here" @>?>?@B@@B@@CAAB@@CAAB@@CAAB@@CAAB@@CAAB@@AAAB@@CAA?@@@AAB@@:AAB@@@;A>A>==@=   MC:Z:2S74M  MD:Z:69 PG:Z:MarkDuplicates.A   RG:Z:H7FL7.2AM:i:18 NM:i:1  SM:i:18 MQ:i:18 OQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJFJJJJJ7JJJJJJ<JFJJJJJF   UQ:i:0

Read2:

  H7FL7BBXX160404:2:1124:24282:45361    163 1   10005   15  27M1I48M    =   10247   318 "sequence here" <:>>?@>>A@@=>>A@@@>>A@@@>>>@AA???B@@A??>AAA@@CAAB@?CAAB@@@AAB@@C1AA??9?>><=#    MC:Z:76M    MD:Z:3A5A5A5A6A10A35    PG:Z:MarkDuplicates.ARG:Z:H7FL7.2   AM:i:15 NM:i:7  SM:i:15 MQ:i:15 OQ:Z:AAFFFJJJJJJFJJJJJJJJJJJJJJFJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJFJJJJJJ<JJJJ7JJJJJ#   UQ:i:242

In the reads above, I do not understand why the first read has SM:i:18 while the second is SM:i:15 - aren't these from the same sample? Shouldn't they be the same? What am I missing here?

Thanks!

sequencing exome • 2.7k views
ADD COMMENT
1
Entering edit mode

was that SM tag, which I understand to mean "Sample"

That's not true: SM stands for "Template independant Mapping quality". https://github.com/samtools/hts-specs/blob/master/SAMtags.pdf

ADD REPLY
2
Entering edit mode

The friendlier URL is http://samtools.github.io/hts-specs/SAMtags.pdf — better to point people to that than to an URL containing details like branch names that may change in future.

ADD REPLY
2
Entering edit mode
7.4 years ago

I think you are confusing the SM tag that is part of the RG group with the SM tag that is part of the reads' tags. If you look at the header of the bam file you should see that each @RG entry has a SM tag that indicates that sample group. E.g.:

samtools view -H TCRBOA1-N-WEX.bam | grep '^@RG'
@RG ID:TCRBOA1-N-WEX.bam    LB:TCRBOA1-N-WEX.bam    PL:illumina SM:TCRBOA1-N-WEX.bam    PU:NA
ADD COMMENT

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6