Entering edit mode
4.0 years ago
robert.murphy
▴
90
When doing a de novo assembly, either hybrid or with just short reads, should you merge paired end short reads?
What are the pros and cons of both?
There are some threads touching on this issue but none with concise answers and information.
Any advice would be appreciated.
If you have reads that are overlapping then the inserts are too short. That may not be good for getting good assemblies.
Ah so paired end reads don't overlap?
No. Overlapping read libraries are designed for specific applications. e.g. 16S sequencing.
Ah yes of course the paired end reads don't always overlap.
Why do some people merge when doing assembly then?
And when should or would you ever merge reads then, what purpose does it serve?
under normal theoretical conditions paired end read libraries should indeed not overlap. However, even when it's not intended they sometimes can still overlap. library size selection is not so precise that it's on a single lenght, more on 'range', so there might be some paired ends that still do overlap.
In the above mentioned case it might be worth to first merge overlapping reads. In any case it will remove some confusion for the assembler as it will thus not encounter reads with a negative distance.
You can also deliberately makes such overlapping libraries (as @genomax already indicated) . rationale here is to get bigger pieces to feed to the assembler, as you already know they are derived from a single molecule
Thank for the reply and help.
Does this mean the insert size here between the forward and reverse read?
Would this not always apply when using paired ends reads?
I was under the impression all paired end reads are derived from a single molecule?
depends on the interpretation (it's kinda confusing) , but that's roughly how you can interpret it indeed (though more often it's the whole length of the fragment from which the forward and reverse read are sequenced, for detail google it I would suggest)
as others , such as @genomax, have indicated it should not happen and it's rather an artefact of poor lib prep (exceptions excluded), so in most cases this merging will not result in much as there should be few to no reads overlapping
absolutely correct. but that does not mean the assembler successfully will merge them (lots of other factors in play here), so if you merge them beforehand, you can feed them to the assembler as single end reads (== no assembly required)