Question

Split BAM into 1 BAM per contig

0

Entering edit mode

9.4 years ago

Marvin ▴ 220

Hi,

I have mapped reads against a reference with bwa and got a big bam file now.

I would like to retrieve a separate bam file for every contig.

I do not mean "I want 1 bam for every chromosome". A chromosome is not a contig since my coverage is 0 in many parts of the genome (aDNA reads).

In this situation e.g. (I hope he doesn't mess up when I post this):

--------------------------------------
  ---   ---               ---   ---
    ----                    -----

I would like to get 2 bam files.

How do I do that?

sam contig • 3.5k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by Marvin ▴ 220

Ram · Answer 1 · 2015-06-30

2

Entering edit mode

9.4 years ago

Pierre Lindenbaum 164k

Use genomecov (http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html) to get a bed with >0 coverage. Then, loop over the bed to extract the reads.

awk '{printf("%s:%s-%s\n",$1,$2,$3);}' cov.bed | while read F; do samtools view -o "${F}.bam" -b your.bam "${F}"; done

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you! Don't know how to use that bedtools genomecov but I will find out.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by Marvin ▴ 220

0

Entering edit mode

I'm struggling, how do I achieve this?

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by Marvin ▴ 220

0

Entering edit mode

I just can't figure it out. The problem is that he opens a new bed line as soon as the coverage within a contig changes. But what I need is 1 line per contig. How is that possible with a single genomecov command? I know how to do it with a pipe but from your answer it sounds like it could be done immediately?

ADD REPLY • link 9.4 years ago by Marvin ▴ 220

0

Entering edit mode

Oh my god I finally made it.

Thank you for helping me alot by pointing me towards the right direction (bedtools is the right choice [didn't know that program before], then awk with samtools as posted by you). However that genomecov suggestion gave me struggles because it didn't get me what I wanted ... all I need is:

bedtools merge -i aln_sorted.bam

This gives me 1 line per contig. Afterwards I can apply the awk-samtools-pipe that you have posted :)

Thanks again!

ADD REPLY • link 9.4 years ago by Marvin ▴ 220

0

Entering edit mode

Can you share the code you used and the files you used as input? I would also like to obtain individual bam files per contig. I tried this process posted here but I am getting zero output files after running the awk command (without genomecov).

I instead used bedtools merge -i aln_sorted.bam > mycontigs.txt then used the awk command and instead of cov.bed I used mycontigs.txt

Do I need a bed file at all for this?

ADD REPLY • link 5.1 years ago by DNAngel ▴ 250