Question

BAM File size increased after extracting unique reads

0

Entering edit mode

5.3 years ago

ilovesuperheroes1993 ▴ 40

Hi, I had used STAR aligner for mapping my reads, and the output BAM files were sorted by coordinate. I used the follwing command to extract unique reads from my bam files:

samtools view -q 255 input_file.bam > unique_reads.bam

(SAM Flag 255 corresponds to unique alignments in STAR)

However, the sizes of my new bam files have increased several-fold. (For example a bam file that was originally 500 mb-900 mb have now become 2.5 gb) This has happened for all the samples.

When I am checking the number of lines in the bam files (the old one and the ones containing the unique reads), it shows that the old file (of size say 500 mb has 44 million lines) while the new file (say size 2 gb has 17 million lines). The number of lines are as expected.

I have checked in the header of both the bam files that both are sorted by coordinate.

So, could anyone tell me why the size of the file containing the lesser number of lines should be so much larger?

BAM STAR Uniuque mapping Alignment Samtools • 1.5k views

ADD COMMENT • link updated 5.3 years ago by michael.ante ★ 3.9k • written 5.3 years ago by ilovesuperheroes1993 ▴ 40

score 3 · Answer 1 · 2019-08-11

3

Entering edit mode

5.3 years ago

michael.ante ★ 3.9k

Hi,

Without the -b option, you'll get a SAM file which is not compressed. Adding -b and -h to your command, will produce a valid and compressed BAM file.

Best,

Michael

ADD COMMENT • link 5.3 years ago by michael.ante ★ 3.9k

0

Entering edit mode

Agreed. Still in the most recent samtools versions you would not even need to set any flags as it recognizes file format based on the suffix if you use -o instead of redirecting stdout like samtools view -q 255 -o unique_reads.bam input_file.bam. WIth your current command you produced a SAM instead of BAM file without a header as -h was missing. When using -b then -h is implied.

ADD REPLY • link 5.3 years ago by ATpoint 85k