I can't understand my sam files
1
0
Entering edit mode
2.8 years ago
ManuelDB ▴ 110

I am learning bout the format of the different NGS formats. Most files are quite easy to understand at least the general aspects. However, when I try to understand the Sam files generated in my lab, I can't easily understand the different fields.

The first line of the body of one of my Sam file looks like this

M00321:561:000000000-JM5F9:1:2107:12468:12982   65      1       14588   9       117M    =       14588   0       CCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTC CCCCCGCFFEEG7@@FFFGGFFCF<<FFGGFGFEEFD<FEFGGF@FGGFGEEFFGGFFF<EAFGGGGGG7@@@EF,C<CECEGCFGCCFE:<C>FFFCFF99:<,8<:*C@C:7*CF   NM:i:0  MD:Z:117        MC:Z:117M       AS:i:117        XS:i:112        RG:Z:1_2        XA:Z:15,-102516461
,117M,1;9,+14699,117M,2;2,-114356309,117M,3;12,-90921,117M,3;

And this is a table explaining a sam file

enter image description here

Things I don't understand are

  1. Second column (FLAG) makes no sense to me according to the next table

enter image description here

  1. What exactly means the third column (in my example the number 1) should be a string shouldn't?
  2. finally, why my sequence has one "white space", more sequence and then a couple of @, more sequence and again a couple of << plus other characters are not part of the sequence?
NGS SAM • 1.3k views
ADD COMMENT
0
Entering edit mode

FWIW, I'd check your command line use to do the alignment. Based on the contents of that line of the sam file, I suspect there may be an error.

ADD REPLY
2
Entering edit mode
2.8 years ago
GenoMax 147k

Second column (FLAG) makes no sense to me according to the next table

Those numbers are additive.

You can use https://broadinstitute.github.io/picard/explain-flags.html to get more information about the SAM flags you see in your file. Simply enter the bitwise flag number and hit "explain".

65 - First read in the pair of a paired-read

https://samformat.info/sam-format-flag is another site you can use.

What exactly means the third column (in my example the number 1) should be a string shouldn't?

That is reference name. In this case chromosome is called 1.

why my sequence has one "white space",

I don't think so. After the space is phred quality scores. You can compare the length of sequence and length of the scores.

ADD COMMENT
0
Entering edit mode

Many thanks for your comments

ADD REPLY

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6