SAM file size after STAR alignment
1
1
Entering edit mode
7.4 years ago
xqyjxau ▴ 50

My RNA-Seq data is in format of fastq(ungzipped from fastq.gz format).I used STAR 2.5.3a mapping the reads with already indexed reference genome. It seems good. But I found the size of generated SAM file is strange. My original input fastq data is like 1.3-1.5 GB, but the SAM file ranges from 3.8 GB to 4.5 GB. Is that normal? If not, what is something wrong there?

RNA-Seq alignment • 5.8k views
ADD COMMENT
0
Entering edit mode

Have you checked the STAR logs to see if there were any errors generated and to see what the alignment percentages looked like? If not the resulting SAM file should be fine.

ADD REPLY
0
Entering edit mode

The data uniquely mapped is from 63%-64%, multiply mapped reads are from 27% to 32%.Is this OK?

ADD REPLY
1
Entering edit mode

There are so many variables here it's impossible to say. If you didn't get an error then presumably you're fine. The file size increase is perfectly normal. But by the sounds of things you really should look into pairing-up with someone who knows what is going on to teach you the ropes :)

ADD REPLY
5
Entering edit mode
7.4 years ago

A SAM file will typically be larger than a FASTQ file because, in general, it contains all the information of the FASTQ plus a lot of other information.

In addition, each FASTQ record may produce more than one alignment, hence you can see how it could easily grow to be much larger than the original data.

ADD COMMENT

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6