SAM flag and NH tag
1
0
Entering edit mode
9.3 years ago
Pfs ▴ 580

If a read name is found multiple times in a SAM file, the tag (NH:i) is set to indicate how many times that read name occurs.

  1. Is the integer following NH counting all the occurrences or just the first mates?
  2. Should then the bit corresponding to secondary alignment in the SAM flag be set?

For example:

Qname  flag  pos  rname pnext ...   tags
read1    83  1    chr3  100 .... NH:i:2
read1    83  10   chr3  110 .... NH:i:2
read1   147  100  chr3  1   .... NH:i:2
read1   147  110  chr3  10  .... NH:i:2

Why none of the SAM flags indicate that read1 has a secondary alignment?

Thanks!

RNA-Seq SAM • 12k views
ADD COMMENT
0
Entering edit mode

Can you paste the real lines from the SAM file and also give us some information about the aligner used and settings etc.

ADD REPLY
0
Entering edit mode

Here is the actual group of reads I was referring to . I would expect the FLAG reporting some of those alignments as "secondary". It is hard to read, but there are 12 occurrences of the same read name, each has a mate, none of the flag has bit x100 set. How do I decide which one is the primary alignment? Do I need to check the (optional) tag CP? Is there something wrong with this SAM file? I took this from the pasilla dataset, I did not generate this alignment. Thanks!

SRR031714.1000001    83    chr3L    5111807    0    37M    =    5111681    0    AGTCAGACTGTCCAGTTTGATGACAGCGTGGTGCGGC    ;IH-2I;IIIIII0IIFII:IIIIIII9IIIIIIIII    NM:i:0    NH:i:6    CC:Z:=    CP:i:22095583
SRR031714.1000001    83    chrX    3312445    0    37M    =    3312319    0    AGTCAGACTGTCCAGTTTGATGACAGCGTGGTGCGGC    ;IH-2I;IIIIII0IIFII:IIIIIII9IIIIIIIII    NM:i:0    NH:i:6
SRR031714.1000001    99    chr2R    14466109    0    37M    =    14466235    0    GCCGCACCACGCTGTCATCAAACTGGACAGTCTGACT    IIIIIIIII9IIIIIII:IIFII0IIIIII;I2-HI;    NM:i:0    NH:i:6    CC:Z:chr3L    CP:i:5105393
SRR031714.1000001    99    chr3L    22095583    0    37M    =    22095709    0    GCCGCACCACGCTGTCATCAAACTGGACAGTCTGACT    IIIIIIIII9IIIIIII:IIFII0IIIIII;I2-HI;    NM:i:0    NH:i:6    CC:Z:chrX    CP:i:325258
SRR031714.1000001    99    chr3L    5105393    0    37M    =    5105519    0    GCCGCACCACGCTGTCATCAAACTGGACAGTCTGACT    IIIIIIIII9IIIIIII:IIFII0IIIIII;I2-HI;    NM:i:0    NH:i:6    CC:Z:=    CP:i:5111807
SRR031714.1000001    99    chrX    325258    0    37M    =    325384    0    GCCGCACCACGCTGTCATCAAACTGGACAGTCTGACT    IIIIIIIII9IIIIIII:IIFII0IIIIII;I2-HI;    NM:i:0    NH:i:6    CC:Z:=    CP:i:3312445
SRR031714.1000001    147    chr2R    14466235    0    37M    =    14466109    0    ACGCGATCTTTTTGGCGTTTGTCTACGCTTCCGGCAG    BIIIIIGII@IIIIIIIIIIIIIIIIIIIIIIIIIII    NM:i:0    NH:i:6    CC:Z:chr3L    CP:i:5105519
SRR031714.1000001    147    chr3L    22095709    0    37M    =    22095583    0    ACGCGATCTTTTTGGCGTTTGTCTACGCTTCCGGCAG    BIIIIIGII@IIIIIIIIIIIIIIIIIIIIIIIIIII    NM:i:0    NH:i:6    CC:Z:chrX    CP:i:325384
SRR031714.1000001    147    chr3L    5105519    0    37M    =    5105393    0    ACGCGATCTTTTTGGCGTTTGTCTACGCTTCCGGCAG    BIIIIIGII@IIIIIIIIIIIIIIIIIIIIIIIIIII    NM:i:0    NH:i:6    CC:Z:=    CP:i:5111681
SRR031714.1000001    147    chrX    325384    0    37M    =    325258    0    ACGCGATCTTTTTGGCGTTTGTCTACGCTTCCGGCAG    BIIIIIGII@IIIIIIIIIIIIIIIIIIIIIIIIIII    NM:i:0    NH:i:6    CC:Z:=    CP:i:3312319
SRR031714.1000001    163    chr3L    5111681    0    37M    =    5111807    0    CTGCCGGAAGCGTAGACAAACGCCAAAAAGATCGCGT    IIIIIIIIIIIIIIIIIIIIIIIIIII@IIGIIIIIB    NM:i:0    NH:i:6    CC:Z:=    CP:i:22095709
SRR031714.1000001    163    chrX    3312319    0    37M    =    3312445    0    CTGCCGGAAGCGTAGACAAACGCCAAAAAGATCGCGT    IIIIIIIIIIIIIIIIIIIIIIIIIII@IIGIIIIIB    NM:i:0    NH:i:6 
ADD REPLY
0
Entering edit mode

That file appears to be broken, don't try too hard to make any sense of it.

ADD REPLY
0
Entering edit mode

Ok. Thank you!

ADD REPLY
1
Entering edit mode
9.3 years ago
  1. It's the total number of occurrences for that mate. So in your example both of the mates should have NH:i:2.
  2. There should only be one primary alignment for each mate, the others should be secondary.
ADD COMMENT
0
Entering edit mode

How do I tell which one is the primary alignment when the bit x100 is not set for any of them?

ADD REPLY

Login before adding your answer.

Traffic: 2684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6