I am learning bout the format of the different NGS formats. Most files are quite easy to understand at least the general aspects. However, when I try to understand the Sam files generated in my lab, I can't easily understand the different fields.
The first line of the body of one of my Sam file looks like this
M00321:561:000000000-JM5F9:1:2107:12468:12982 65 1 14588 9 117M = 14588 0 CCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTC CCCCCGCFFEEG7@@FFFGGFFCF<<FFGGFGFEEFD<FEFGGF@FGGFGEEFFGGFFF<EAFGGGGGG7@@@EF,C<CECEGCFGCCFE:<C>FFFCFF99:<,8<:*C@C:7*CF NM:i:0 MD:Z:117 MC:Z:117M AS:i:117 XS:i:112 RG:Z:1_2 XA:Z:15,-102516461
,117M,1;9,+14699,117M,2;2,-114356309,117M,3;12,-90921,117M,3;
And this is a table explaining a sam file
Things I don't understand are
- Second column (FLAG) makes no sense to me according to the next table
- What exactly means the third column (in my example the number 1) should be a string shouldn't?
- finally, why my sequence has one "white space", more sequence and then a couple of @, more sequence and again a couple of << plus other characters are not part of the sequence?
FWIW, I'd check your command line use to do the alignment. Based on the contents of that line of the sam file, I suspect there may be an error.