I have some large bam files for a pipeline. But the RAM ran out when I was performing the pipe. Sorting bam will cause fatal bugs to my pipeline and cannot be debuged for now, hence not the solution. The only option is to downsize the input (upgrading the RAM cannot be done for now due to the high expense).
Then I thought I could split the original bam to several smaller bam files. But I cannot skip the sorting step. E.g. samtools view
requires sorting and indexing before spliting.
Is there other way to do this?
Thank you.
samtools view -h
should not need sorted BAM. If you split the BAM file (creating muiltiple intermediate SAM files) make sure you add the header to all the pieces otherwise the files may be unusable.I've tried. It requires an indexed bam. But indexing requires sorted BAM.
How so? You are simply converting a BAM file to SAM using
samtools view -h
.Here is the command I used to split the bam.
samtools view -h -@ 48 sampleAligned.out.bam chr1 > sample_1.bam
. Because I specified the region, it requires sorted and indexed BAM.May I ask how to create splitted BAM referring to your method? Thanks.
This is new information. Your post originally had not asked about splitting a BAM not based on a specific region. If you need to do this then indeed the file will have to be sorted.
If you don't have the resources, you could split the unsorted file first. Grab regions you need from the pieces after sorting/indexing them, merge the region specific files and then sort them again. This will be a lot more work but will eventually get you what you need.