Question

How to merge paired-end reads from sam files?

0

Entering edit mode

7.8 years ago

aquaq ▴ 40

Hi,

I have paired-end read sequencing data. I have aligned reverse and forward reads with bwa mem. Reverse and forward reads are 120 nucleotid long and they cover a 180 nucleotid long part of a genome, hence they overlap.

bwa mem  $REF $file1 $file2 -t 20 > $sam

When I open the sam output file, the first lines begin like this:

M00135:404:HBJFESJSN:2:1101:2016:1297   53      ref   ...
M00135:404:HBJFESJSN:2:1101:2016:1297   133     ref   ...
M00135:404:HBJFESJSN:2:1101:2646:1297   53      ref   ...
M00135:404:HBJFESJSN:2:1101:2646:1297   133     ref   ...

For every pair, I have the two lines aligned to the reference from the two directions ( I know, this is the normal output). Is it possible to combine reverse and forward reads to one sequence, thus getting a 180 nucleotid long alignment for each pair?

Many thanks!

EDIT: sorry for not being clear, I would like to merge pairs after alignment is done.

seq bwa paired-end • 4.0k views

ADD COMMENT • link updated 7.7 years ago by WouterDeCoster 47k • written 7.8 years ago by aquaq ▴ 40

2

Entering edit mode

7.7 years ago

WouterDeCoster 47k

I just saw this tool by chance, but obviously I have no idea how well it works.

ADD COMMENT • link 7.7 years ago by WouterDeCoster 47k

score 2 · Accepted Answer · 2017-02-23

2

Entering edit mode

7.8 years ago

WouterDeCoster 47k

BBMerge can do this :)

ADD COMMENT • link 7.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks. I have used pandaseq for this problem as well, but I would like to merge sequences after alignment, not before... I am sorry, I was not clear on this.

ADD REPLY • link 7.8 years ago by aquaq ▴ 40

1

Entering edit mode

I'm not sure what you biological motivation is for this objective, but I'm completely against tampering with alignment data. Which problem are you trying to solve?

ADD REPLY • link 7.8 years ago by WouterDeCoster 47k

0

Entering edit mode

It would be just a trial. In a specific part of the sequence that we are interested in, there is a large number of mutations/sequencing error (it was a random sequence, but it was not supposed to be that random). I just wanted to be sure that it is not caused by some weird behaviour of pandaseq that I am not aware of before continuing with further analysis. But I could totally accept if that's unusual, I will find an other way to confirm it (eg by running bbmerge and comparing the results). Thanks for help!

ADD REPLY • link 7.8 years ago by aquaq ▴ 40

0

Entering edit mode

I would also like to do this, and yes, after alignment, because I am using a downstream application that needs a merged PE format, but the alignments contain < 1% of the total original fastq reads, and it will be much more efficient to merge only the aligned reads. Did you try using aftermerge? How did it go?

ADD REPLY • link 7.1 years ago by norah.saarman • 0