Entering edit mode
9.4 years ago
ashishtx
•
0
Hello everyone,
I am trying to grasp SAM format specification along with BWA program. I see that the length of the Reference sequence length does not match with the alignment.
So my question is why SEQ1 length which is 407 bp does not match with the SAM header information which shows that the length of the reference is 402 bp?
Am I missing something very basic?
Thank you
>SEQ1
ATGCAGCTGTTCATCCACTGTCAAGGGGTTCATACCGTTGAAGTTACAGGTGAAGAGGAAGTTGCTTTCC
TCAAGCAATACCTCGAGCAGGCCGAGGGCATTGCACCTGCTGATCAAGTCCTCTACCATTCTGGCAAGCC
CCTGAGCGACGAGCTTTCTCTCTCCTGCCTGGAGAATGGTGCTTATGTTGAAGCTGTCGTCCCTCTTCTT
GGAGGTAAGGTCCATGGCTCCCTGGCTCGTGCCGGCAAGGTCAAGGGCCAGACACCGAAGGTAGAGAAAC
AGGAGAAGCGCAAGAAGAAGACCGGCCGTGCCCAGAGGCGCATGCAGTACAACAGGCGGGTCGTGAATGC
CGTTGCCACCTTCGGGCGCANGAGAGGACCCAATGCAAACCAAACTGCATAG
Sam file header:
@SQ SN:SEQ1 LN:402
NODE32439length524cov2064.38ID64877 0 SEQ1 1 60 61S161M3I241M58S * 0 0 CAGCATTTTTTTTGTTATTTGGTTCGTGGGTTGCTGGACGTGTGTACACGTTTGCAAGAAGATGCAGCTGTTCATTCACTGTCAAGAAGTTCACACCGTAGAAGTTACAGGCGACGAGAATGTCGCCTTCCTCAAGGAAGTTCTTGAGCAGGCCGAAGGCATTGCACCTGTTGATCAGGTCCTCTACAACTCTGGCAAGCCCCTGAGTGATGATGTTTCTCTGTCCTCCTGCCTTGAGGATGGTGCTCATGTCGAGGCCGTTGTTCCTCTGCTCGGAGGTAAGGTCCACGGCTCACTGGCTCGTGCTGGCAAAGTGAAGGGCCAGACACCGAAGGTGGAGAAACAGGAGAAACGCAAGAAGAAGACTGGCCGTGCCAAGAGGCGCATGCAGTACAACAGGCGGTTTGTGAATGCTGTTGCCACCTTTGGCCGCAGGAGGGGACCCAATGCAAACCAAACTTCATAGAGAGATGGGCCTGTGACAAATAAAATTTGTATGGTGCGTTCCTGGACGTGGTGCTCAC * NM:i:55 MD:Z:14C10G0G5T5T11T2A3G1A2T2T9C2T0A0C2C11G13C6A9C1T17C2C2G0C16G3A8T4T2A2T2C2C5T2T14T5C11C5G2C20A14G14C9C26G1C8C11C2G4C3A21G5 AS:i:133 XS:i:0
Your sequence as is shown in this post is really 402bp.
Whoops. You guys are right. Actually I was using the sublime text to count characters and it kept showing 407. Thanks Ashutosh Pandey and Heng Li (I really admire your software).
It's an honor to be mentioned in the same line as Dr. Li :-)
The length of the reference sequence doesn't include its header. I guess you are adding
>SEQ1
into the length which is wrong.I am pretty sure I did not include the header. Thanks for the response.
You may have forgotten to strip off "\n" character before counting in case you are using some script.