I ran into an odd situation where I'm able to open bam file in text editor..... I know it sounds strange!!!
I'm working with mitochondrial genome, and I had to filter out reads that are aligning only to mitochondrial reference genome (I'm using revised Cambridge reference sequence (NC_012920)). However, the resulting bam file is not in compressed format. The file extension is in .bam, but, I'm able to open it in a text editor.
Here are the steps how I got to this bam file.
#I'm using hisat2 to align fastq file to mitochondrial reference genome (I indexed it with hisat2).
hisat2 -p 8 -x /home/user/Reference_genome/mitochondria/chrM -1 /home/user/mitochondrial_analysis/SRR6423582_R1.fastq.gz -2 /home/user/mitochondrial_analysis/SRR6423582_R2.fastq.gz > /home/user/mitochondrial_analysis/SRR6423582.sam
#Converting SAM to BAM. (In this step BAM is getting generated properly in compressed format....)
samtools view -h -S -b /home/user/mitochondrial_analysis/SRR6423582.sam > /home/user/mitochondrial_analysis/SRR6423582.bam
#Sorting BAM file. (BAM file is in correct format here as well..)
samtools sort /home/user/mitochondrial_analysis/SRR6423582.bam -o /home/user/mitochondrial_analysis/SRR6423582_sorted.bam
#Filtering out reads aligned only to mitochondrial genome. (Only in this last step, the generated BAM file is not in compressed format.)
samtools view -f 0x3 /home/user/mitochondrial_analysis/SRR6423582_sorted.bam > /home/user/mitochondrial_analysis/SRR6423582_MT.bam
First, I assumed, since the file size is less, samtool has not compressed it. However, when I tried to parse this bam file to another tool, it's throwing as exception stating SAMFormatException: Does not seem like a BAM file
. Could someone please help me with this issue?
Also, I would like to mention that this is my first project with mitochondrial genome. So, if I'm making any mistakes in the previous steps, please correct me.
Thank you for guiding me @Pierre Lindenbaum. I'm able to get BAM file correctly now. :)
As a general note in case this wasn't obvious - file extensions in Unix systems are merely conventions, they don't actually "mean" anything. Just because it ends in
.bam
doesn't guarantee anything about the file content.