Hi all:
I recently got quite confused with two SAM flags got from BWA alignment, which is "supplementary alignment" from chimeric alignments and "not primary alignment" (or "secondary alignment") from multiple mapping.
What samtools explain about these two flags is: (Refer to https://samtools.github.io/hts-specs/SAMv1.pdf)
A chimeric alignment is primarily caused by structural variations, gene fusions, misassemblies, RNA-seq or experimental protocols. It is more frequent given longer reads. For a chimeric alignment, the linear alignments consisting of the alignment are largely non-overlapping. Typically, one of the linear alignments in a chimeric alignment is considered the "representative" alignment, and the others are called "supplementary" and are distinguished by the supplementary alignment flag.
In contrast, multiple mappings are caused primarily by repeats. They are less frequent given longer reads. If a read has multiple mappings, all these mappings are almost entirely overlapping with each other. In multiple mapping, One of these alignments is considered "primary". All the other alignments have the "secondary" alignment flag set in the SAM records that represent them.
However, I found in my ChIP-seq alignment results got from BWA(without -M option), alignments with "supplementary" flags are with overlaps with "representative" alignments,which I think should be "secondary" alignments as described. For example, I got four alignments for one pair of reads:
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 163 chr6 144444720 60 61M40S = 144444728 61 GTACACACATATACACAGTGCTAAGTTCATTGTACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCATTGTACACAC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:1C59 AS:i:59 XS:i:0 SA:Z:chr6,144444722,+,33S59M9S,60,2;
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 2131 chr6 144444722 11 56H45M = 144444720 -47 ACACACATATACACAGTGCTAAGTTCATTGTACACACATATACAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:45 AS:i:45 XS:i:20 SA:Z:chr6,144444728,-,53M48S,11,0;
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 2211 chr6 144444722 60 33H59M9H = 144444728 59 ACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:2 MD:Z:22G30C5 AS:i:49 XS:i:0 SA:Z:chr6,144444720,+,61M40S,60,1;
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 83 chr6 144444728 11 53M48S = 144444720 -61 ATATACACAGTGCTAAGTTCATTGTACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCATTGTACACACATATACAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:53 AS:i:53 XS:i:49 SA:Z:chr6,144444722,-,56S45M,11,0;
The 2nd and 3rd alignment with "2131" and "2211" flags are as "supplementary" alignments, however, they are fragments of the other two full-length alignments. I didn't find any reads with "secondary" flags in my results, but all alignments with "supplementary" flags I checked are cases like what I show above.
Can anyone help explain this? Should I remove these "supplementary" alignments to keep uniquely mapped reads? Thanks very much.
Best, Vanilla
Thanks Devon!
I agree as just checked that this is a repeat "ACACACATATACACAGTGCTAAGTTCATTGT" around chr6:144444722 on mm10, which causes the multiple alignments for the reads.
Oh I also have a lot of cases with supplementary alignments mapped elsewhere, with sequence overlaps but not genome coordinates overlap, like following(the first alignment with "2227" flag is as supplementary alignment):
What do you think about such cases? Thanks!
Supplemental alignments are for any case where subsets of a read can be aligned in biologically impossible ways, so that makes sense. I wouldn't bother even considering the supplemental alignment.
Got it. So I will also just throw them away.
In case you also have secondary alignments from multiple mapping(e.g. due to repeat sequences), should them also be discarded for ChIP-seq analysis? I'm also curious why I didn't get any of secondary alignments.
Most aligners default to not returning secondary alignments by default. Yes, generally they get ignored (though not always, in case you're interested in something overlapping repeats).
Got it. Thanks very much for your help Devon!