Should I merge forward and reverse reads before mapping
2
0
Entering edit mode
10.0 years ago
qiyunzhu ▴ 130

Dear Community,

I have been working on estimating the relative abundancy of selected bacterial strains in a metagenomic dataset, derived from whole genome sequencing using Illumina TruSeq kit + HiSeq sequencer. My thought is to do initial quality control using Trimmomatic, followed by mapping to reference bacterial genomes using Bowtie2. I read that I can merge forward and reverse reads before mapping, using tools like PEAR.

Now I am wondering if this merging step is recommended or not? Will this make the subsequent mapping more accurate?

Also, since I am quantifying reads mapped to each bacterial genome. Once I merge the reads and map, I should treat merged and unmerged reads differently in calculation. For example, One hit from a merged read should be counted twice, as compared to two hits from both forward and reverse reads. Am I right?

Thanks in advance!

sequencing genome illumina read mapping • 9.9k views
ADD COMMENT
5
Entering edit mode
10.0 years ago

I did a comparison of the accuracy of various merging tools here. PEAR did not perform very well.

But, generally, I don't recommend merging before mapping. It's more relevant to assembly. As you note, it would make coverage calculations a bit more tricky. And it would possibly introduce some bias, without any particular benefit.

Update:

Merging before mapping is actually quite useful for detecting midsize (~50-400bp) insertions, as that ability is dependent on read length. Other than that scenario I still don't recommend it.

ADD COMMENT
0
Entering edit mode

Thanks for your valuable information!

ADD REPLY
0
Entering edit mode

I have been thinking that, wouldn't longer reads be mapped to the reference more accurately?

ADD REPLY
2
Entering edit mode

Yes, but aligners also try to keep pairs together. So if read 1 could map to 5 locations, and read 2 could map to 3 locations, but there is only one location where both could map nearby, that is the site that will be selected. So there should not be much difference in sensitivity or specificity between paired reads and merged reads.

ADD REPLY
0
Entering edit mode

That sounds reasonable. Thanks for your explanation!

ADD REPLY
0
Entering edit mode

Hi Brian,

I have some confusion regarding your explanation, could you please clarify them to me?

I understand that when merging 2 paired reads, we only merge the overlap part of them, if they have innie-orientation, only the end (arrow head) parts of them are merged, the larger tail head will remain the same, and isn't that the tail path of both read are prone for mapping? So mapping would not be affected, isn't it? And if we keep pairing information by merging into longer read, will it increase accuracy in mapping?

I could present this idea by the illustration below:

R1 map to 5  ------------------------------>
                                                ||||||||||||||||||
                                               <----------------------------------- R2 map to 3 locations

Thank you in advance for your ideas and suggestion!

ADD REPLY
0
Entering edit mode

A correctly merged read will map more accurately than either read1 or read2 alone, because it is longer. But when mapped as a pair, the accuracy should be similar whether merged or unmerged. Merging has the advantage of reducing the substitution error rate in the overlapping region, but it has the disadvantage of potentially introducing indels in false-positive merges. That's very rare with BBMerge, though.

ADD REPLY
0
Entering edit mode

Thank you Brian!

ADD REPLY
0
Entering edit mode
10.0 years ago
epistatic ▴ 190

Would paired reads be better than merged reads for fusion/rearrangement detection? I have long overlapping paired reads and using BWA-MEM for alignment. I have been merging the overlap into a single long read and then aligning.

ADD COMMENT
0
Entering edit mode

BWA-MEM is pretty good at reporting multiple chimeric local alignments from a single read, so it may not matter too much. Though I find it simpler to treat the reads as pairs, because when merging you always end up with two classes of reads - merged and unmerged - which need to be treated differently.

ADD REPLY

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6