Hi,
I'm not able to sort the bam file correctly using samtools sort. My bam file size, increases after sorting ? Aligner - LAST Dataset- Human genome dataset I generated a MAF file which I converted further in to sam ---> bam converted ---> sorted bam (problemetric)
1) Converting LAST output, MAF to sam
./maf-convert sam last-941/src/SRR2928269.test.maf > SRR.2928269last.sam
2) I added header to sam
samtools view -bT test/hg38.fa SRR2928269.last.sam > SRR2928269.last.bam
3) BAM sorting
samtools sort -@ 5 SRR2928269.last.bam -T /tmp/SRR2928269.last.bam.sort -o SRR2928269.last.sorted.bam</pre>
(NOT working on bigger datasets, sometimes)
samtools sort -@ 10 SRR2928269.last.bam -o SRR2928269.last.sorted.bam
(This cmd generating much bigger bam file when compare to unsorted bam file)
bam file size
Dataset -1
60G ----> SRR2928269.last.bam
91G ----> SRR2928269.last.sorted.bam **(after sorting bam)**
Dataset - 2
61G ----> SRR2928268.last.bam
95G ----> SRR2928268.last.sorted.bam **(after sorting bam)**
Dataset - 3
83G -----> SRR2928267.last.bam
122G ------> SRR2928267.last.sorted.bam **(after sorting bam)**
can any one comment on this, how to sort it out ?
@OP: my understanding is that sorted bam is smaller (by a little bit) than unsorted bam. Probably take a small set of your unsorted bam and then sort and see.
No, sorted bam is much bigger than unsorted bam file why ? once again I tried the sorting it generated same file size bam.
I cannot reproduce this. My sorted BAMs are always slightly smaller than the unsorted ones. Check if the number of reads are the same, and then proceed with your analysis.
Then something is wrong.
Just out of interest @OP, is it correct that you use hg38.fa together with SRR2928269 as in your command, because SRR2928269 is RNA-seq from a monkey.
pinninti1991reddy : Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!