Question

combining / merging overlapping reads in bam file to one in silico long read

0

Entering edit mode

5.0 years ago

Lilly • 0

Hi there! I am trying to merge overlapping reads in a bam file to one in silico long read. An example:

chr1    1    4    ACAC
chr1    3    6    ACTT
chr1    6    9    TGGG
chr1    11   14   ACTG

The desired result is:

chr1    1    9    ACACTTGGG
chr1    11   14   ACTG

I tried bedtools merge which gives me the start and end position of the merged reads, even the count and read identifiers of reads that overlap and were merged, but unfortunately not the actual merged read sequence. When trying to use -c 4 -o collapse -delim "" the sequences unfortunately just get concatenated and therefore some bases are doubled, since they are not observed back to back, but overlapping.

I also read into getting consensus sequence from bam files and clipping of overlapping read pair mates, but all that is not what I want to achieve. Any help is appreciated! Thanks!

genome sequence assembly bam overlap • 2.2k views

ADD COMMENT • link 5.0 years ago by Lilly • 0

score 0 · Answer 1 · 2020-05-05

0

Entering edit mode

5.0 years ago

rbagnall ★ 1.8k

After bedtools merge, you could pipe the output into bedtools getfasta

ADD COMMENT • link 5.0 years ago by rbagnall ★ 1.8k

0

Entering edit mode

thanks rbagnall, but do you mean I should go back to the ref genome and get the sequence from there? unfortunately, that is not possible, since I actually work with the SNPs covered by the overlapping reads.

ADD REPLY • link 5.0 years ago by Lilly • 0

0

Entering edit mode

Can you get the reference genome to which your BAM file is aligned? I'm not sure how you could join together the sequence using only the example format from above.

ADD REPLY • link 5.0 years ago by rbagnall ★ 1.8k

0

Entering edit mode

Yes, I have the reference genome, and I know the heterozygous SNPs of it. However, those SNPs are captured by the short reads in my bam which I want to merge. Taking the start and end position of the merged reads and getting the respective sequence from the reference genome removes the information of covered SNPs in my bam.

ADD REPLY • link 5.0 years ago by Lilly • 0

0

Entering edit mode

Hi Julia, Have you had any progress in merging reads in the way you described? I have a similar problem and tried aftermerge (https://github.com/KoenHerten/aftermerge), but so far with no success.

ADD REPLY • link 4.2 years ago by lechu ▴ 20