BAM File size increased after extracting unique reads
1
0
Entering edit mode
5.3 years ago

Hi, I had used STAR aligner for mapping my reads, and the output BAM files were sorted by coordinate. I used the follwing command to extract unique reads from my bam files:

samtools view -q 255 input_file.bam > unique_reads.bam

(SAM Flag 255 corresponds to unique alignments in STAR)

However, the sizes of my new bam files have increased several-fold. (For example a bam file that was originally 500 mb-900 mb have now become 2.5 gb) This has happened for all the samples.

When I am checking the number of lines in the bam files (the old one and the ones containing the unique reads), it shows that the old file (of size say 500 mb has 44 million lines) while the new file (say size 2 gb has 17 million lines). The number of lines are as expected.

I have checked in the header of both the bam files that both are sorted by coordinate.

So, could anyone tell me why the size of the file containing the lesser number of lines should be so much larger?

BAM STAR Uniuque mapping Alignment Samtools • 1.5k views
ADD COMMENT
3
Entering edit mode
5.3 years ago
michael.ante ★ 3.9k

Hi,

Without the -b option, you'll get a SAM file which is not compressed. Adding -b and -h to your command, will produce a valid and compressed BAM file.

Best,

Michael

ADD COMMENT
0
Entering edit mode

Agreed. Still in the most recent samtools versions you would not even need to set any flags as it recognizes file format based on the suffix if you use -o instead of redirecting stdout like samtools view -q 255 -o unique_reads.bam input_file.bam. WIth your current command you produced a SAM instead of BAM file without a header as -h was missing. When using -b then -h is implied.

ADD REPLY

Login before adding your answer.

Traffic: 1864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6