Hi All, I need to merge every forward read of a PE sequencing experiment with its corresponding mate, whenever a pair is overlapping. I am aware of similar threads (e.g., this), but none provide a satisfactory solution. I know that there are tools to merge reads without alignment, but I would like to use alignment information to improve the fidelity of the merge. I tried aftermerge.py, but without success (errors, see below). I would be happy to hear about potential other ways to accomplish the task, or from someone who had more luck with aftermerge.
Aftermerge error 1:
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 70
[main_samview] truncated file.
Aftermerge error 2:
➜ test_map_full samtools view -h TI-2680-RTA10_S16_interleaved.fastq_slamdunk_mapped.bam | aftermerge.py - output.sam Traceback (most recent call last): File "/Users/lecka48/scripts/aftermerge.py", line 219, in <module>
(newseq, newqual, newmcigar) = mergeSequences(seq1, seq2, qual1, qual2, mcigar1, mcigar2, refstart, materead.pos) File "/Users/lecka48/scripts/aftermerge.py", line 88, in mergeSequences
(newseq, newqual, newmcigar) = mergeSequences(seq1[1:], seq2[1:], qual1[1:], qual2[1:], mcigar1[1:], mcigar2[1:], start1+1, start2+1) File "/Users/lecka48/scripts/aftermerge.py", line 66, in mergeSequences
(newseq, newqual, newmcigar) = mergeSequences(seq1[1:], seq2[1:], qual1[1:], qual2[1:], mcigar1[1:], mcigar2[1:], start1+1, start2+1) File "/Users/lecka48/scripts/aftermerge.py", line 66, in mergeSequences
(newseq, newqual, newmcigar) = mergeSequences(seq1[1:], seq2[1:], qual1[1:], qual2[1:], mcigar1[1:], mcigar2[1:], start1+1, start2+1) File "/Users/lecka48/scripts/aftermerge.py", line 66, in mergeSequences
(newseq, newqual, newmcigar) = mergeSequences(seq1[1:], seq2[1:], qual1[1:], qual2[1:], mcigar1[1:], mcigar2[1:], start1+1, start2+1) [Previous line repeated 67 more times] File "/Users/lecka48/scripts/aftermerge.py", line 64, in mergeSequences
if (mcigar1[0] == mcigar2[0]): IndexError: string index out of range
Finally, I'd like to highlight I am looking for a program that includes information about the alignment to reference genome (coordinates of each read) for merging. All methods described here Tools to merge overlapping paired-end reads seem to assemble the reads directly from fastq files, using the overlap between mates to do the job.
You should go back and check your alignments to see why there are differences in CIGAR and query. Try a different aligner out. Perhaps
aftermerge
requires an older format (SAM v.1.3?)Thanks Genomax for your help! The alignments are fine, the problem occurs with the SAM file produced by merging. First, it does not have a header, and second, the CIGARs are problematic:
It might be that a different aligner would work, but I have to use NextGenMap for that. I may try bbmerge. The aim here is to deal with the pair overlaps to avoid counting variants twice. I can't believe this is not a more commonly used approach, and there is such a scarcity of tools.