Hello all,
I just wanted to verify that I got it correct: In an sequencing experiment, when I have the SAM file, I get these kind of lines:
read1 16 reference 7695 255 36M15D69M * 0 0 GATAGCATTGGGAGATATACCTAATGCTAGATGACGGGGTGAACATTAGTGGGTGCAGCGCACAAGCATGGCACATGTATACATATGTAACTAACCTGCACAATG HHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHGGGGGFHHHHHHHHHHHGGGHHHGGGGGHHHHHHHHHHHHHHHHHHHHHHHHGGGGGGGGGGCFFFFFFCCCCC NH:i:1 HI:i:1 AS:i:65 nM:i:3 NM:i:18 MD:Z:36^TCCATACTGAGAATC0A0T2T64 jM:B:c,-1 jI:B:i,-1
read2 16 reference 7695 255 35M33S * 0 0 GATAGCATTGGGAGATATACCTAATGCTAGATGACACGAGTAACATTAGTGGGTGCAGCGCACAAGCA HHHHHHHHHGBHHHHHGD5FHDEFGHHHGHHFFGFCEHDGHHEHHHGGGGFGGGGGBDA5C4FA>>3> NH:i:1 HI:i:1 AS:i:34 nM:i:0 NM:i:0 MD:Z:35 jM:B:c,-1 jI:B:i,-1
read3 16 reference 7751 255 41S39M * 0 0 GATAGCATTGGGAGGTATACCTAATGCTAGATGACCTTACGAACATTAGTGGGTGCAGCGCACAAGCATGGCACATGTAT HHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHGHGHHGGHHHHHHHHGGGHHHGGGGGGGGGGGGGGFFFFFFFBBBBB NH:i:1 HI:i:1 AS:i:38 nM:i:0 NM:i:0 MD:Z:39 jM:B:c,-1 jI:B:i,-1
I wanted to clarify 2 things: (a) the left-most position of my alignment against the reference is the 3rd column, correct? (b) If I want to know the right-most position, i.e. where the alignment ended, can I just add the numbers in my CIGAR strings? So, for example, for read1 it would be 36+15+69, read2 35+33 and read3 41+39? To me it makes sense because for read2 and read3 the numbers are actually equal to the read length (68 and 80 respectively), while, for read1, the numbers in the CIGAR string add up to 105, the read is 120, but I know I have a deletion there so it is fine.
I would be grateful if someone can tell me if I am doing things correctly.
Many thanks!
Many thanks Devon (yes I meant to write column 4 :) )
I also saw this one:
and the read is
so here the total length is 35+9 or 35+1349+9? I am bit confused..
Yes, it's a spliced read.
So it is 35 + 9, the 1349 is ignored, right? Or?
1349 is not ignored. If it helps, have a look at the file in IGV.
Ok then I only extract I, S and H as you initially wrote, then, in this example the alignment was 35bases at some point and then 9 more after 1349 bases that are omitted. But then is it correct for me to say that the alignment start e.g. at position 7751 and finishes at position 7751 + 35 + 1349 +9?
7751 + 35 + 1349 + 9 - 1
, since otherwise you're double counting a base.Many thanks for your help!