Split a BAM within a chromosome without losing read pairs
1
0
Entering edit mode
7.8 years ago
Ian Fiddes ▴ 70

I am trying to figure out a way to split up a BAM file into many small pieces without splitting up read pairs. The resulting BAM files needs to be name-sorted, although that operation could be performed on each chunk after the splitting process.

As far as I can tell, there aren't any existing tools that do this. Does anyone know of anything?

RNA-Seq bam • 1.9k views
ADD COMMENT
0
Entering edit mode

It would be highly likely that name sorting it first and then splitting will be much much easier as pairs have the same name in a BAM file!

ADD REPLY
0
Entering edit mode

First thing that comes to my mind is that you could probably split over the centromere without issues, but I assume you want to split more than once.

ADD REPLY
0
Entering edit mode
7.8 years ago

Since you have RNA-Seq in your keywords I assume this is RNA-Seq data. If so, if you split at intergenic regions you should de-couple a relatively small number of pairs. You would probably obtain a few thousand files. You can then use samtools fixmate to keep only paired reads within files.

In general, I think finding the optimal solution that minimises the number of decouplings is quite a difficult problem and it may be anyway impractical.

ADD COMMENT

Login before adding your answer.

Traffic: 2334 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6