Hi,
I have paired-end read sequencing data. I have aligned reverse and forward reads with bwa mem. Reverse and forward reads are 120 nucleotid long and they cover a 180 nucleotid long part of a genome, hence they overlap.
bwa mem $REF $file1 $file2 -t 20 > $sam
When I open the sam output file, the first lines begin like this:
M00135:404:HBJFESJSN:2:1101:2016:1297 53 ref ...
M00135:404:HBJFESJSN:2:1101:2016:1297 133 ref ...
M00135:404:HBJFESJSN:2:1101:2646:1297 53 ref ...
M00135:404:HBJFESJSN:2:1101:2646:1297 133 ref ...
For every pair, I have the two lines aligned to the reference from the two directions ( I know, this is the normal output). Is it possible to combine reverse and forward reads to one sequence, thus getting a 180 nucleotid long alignment for each pair?
Many thanks!
EDIT: sorry for not being clear, I would like to merge pairs after alignment is done.
Thanks. I have used pandaseq for this problem as well, but I would like to merge sequences after alignment, not before... I am sorry, I was not clear on this.
I'm not sure what you biological motivation is for this objective, but I'm completely against tampering with alignment data. Which problem are you trying to solve?
It would be just a trial. In a specific part of the sequence that we are interested in, there is a large number of mutations/sequencing error (it was a random sequence, but it was not supposed to be that random). I just wanted to be sure that it is not caused by some weird behaviour of pandaseq that I am not aware of before continuing with further analysis. But I could totally accept if that's unusual, I will find an other way to confirm it (eg by running bbmerge and comparing the results). Thanks for help!
I would also like to do this, and yes, after alignment, because I am using a downstream application that needs a merged PE format, but the alignments contain < 1% of the total original fastq reads, and it will be much more efficient to merge only the aligned reads. Did you try using aftermerge? How did it go?