Sorting an already sorted file with samtools reduced its size
1
0
Entering edit mode
5.4 years ago
shhsvri • 0

Hi all

I used the

STAR ... --outSAMtype BAM   SortedByCoordinate ...

option while mapping my paired end .fastq files.

I then sorted it using samtools

samtools sort -O bam -o STAR_output.sorted.bam STAR_output.bam

The size of the bam file shrank by 25% from 8G to 6G after sorting with samtools. their default sort is sorts by coordinate (position).

I assume that the sorted bam file from STAR is the same as the samtools sorted bam file. though the latter is more compressed. Is that true?

RNA-Seq alignment • 1.8k views
ADD COMMENT
0
Entering edit mode

There are several compression level for bam. Maybe samtools and start using different as default.

But why do you use samtools sort if star give you already a coordinate sorted bam file?

ADD REPLY
2
Entering edit mode
5.4 years ago
h.mon 35k

I assume you are correct, as the documentation of both tools explaining the default values are in line with your hypothesis.

For samtools sort, the default value is -1, which means it uses the default for zlib, which is usually 6. You can change the compression level with -l.

For STAR, the default compression level is 1, which is the lowest compression. You can change the compression level with --outBAMcompression.

ADD COMMENT

Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6