To merge or not to merge during de novo assembly?
0
0
Entering edit mode
4.0 years ago

When doing a de novo assembly, either hybrid or with just short reads, should you merge paired end short reads?

What are the pros and cons of both?

There are some threads touching on this issue but none with concise answers and information.

Any advice would be appreciated.

Assembly • 947 views
ADD COMMENT
1
Entering edit mode

If you have reads that are overlapping then the inserts are too short. That may not be good for getting good assemblies.

ADD REPLY
0
Entering edit mode

Ah so paired end reads don't overlap?

ADD REPLY
0
Entering edit mode

No. Overlapping read libraries are designed for specific applications. e.g. 16S sequencing.

ADD REPLY
0
Entering edit mode

Ah yes of course the paired end reads don't always overlap.

Why do some people merge when doing assembly then?

And when should or would you ever merge reads then, what purpose does it serve?

ADD REPLY
0
Entering edit mode

under normal theoretical conditions paired end read libraries should indeed not overlap. However, even when it's not intended they sometimes can still overlap. library size selection is not so precise that it's on a single lenght, more on 'range', so there might be some paired ends that still do overlap.

In the above mentioned case it might be worth to first merge overlapping reads. In any case it will remove some confusion for the assembler as it will thus not encounter reads with a negative distance.

You can also deliberately makes such overlapping libraries (as @genomax already indicated) . rationale here is to get bigger pieces to feed to the assembler, as you already know they are derived from a single molecule

ADD REPLY
0
Entering edit mode

Thank for the reply and help.

is not so precise that it's on a single length

Does this mean the insert size here between the forward and reverse read?

In the above mentioned case it might be worth to first merge overlapping reads. In any case it will remove some confusion for the assembler as it will thus not encounter reads with a negative distance.

Would this not always apply when using paired ends reads?

as you already know they are derived from a single molecule

I was under the impression all paired end reads are derived from a single molecule?

ADD REPLY
1
Entering edit mode

Does this mean the insert size here between the forward and reverse read?

depends on the interpretation (it's kinda confusing) , but that's roughly how you can interpret it indeed (though more often it's the whole length of the fragment from which the forward and reverse read are sequenced, for detail google it I would suggest)

Would this not always apply when using paired ends reads?

as others , such as @genomax, have indicated it should not happen and it's rather an artefact of poor lib prep (exceptions excluded), so in most cases this merging will not result in much as there should be few to no reads overlapping

I was under the impression all paired end reads are derived from a single molecule?

absolutely correct. but that does not mean the assembler successfully will merge them (lots of other factors in play here), so if you merge them beforehand, you can feed them to the assembler as single end reads (== no assembly required)

ADD REPLY

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6