Question

reverse complement in Assembly!

0

Entering edit mode

9.5 years ago

midox ▴ 290

Hello,
in the case of an assembly paired end (two lines) we have foward.fq and reverse.fq.
Do we must reverse complement of file 2 (reverse.fq) for assembly or not? or we need to do a reverse of reverse.fq?

I know that there are assemblers who do this work but I want to know this principle.
Thank you

Assembly paired paired end • 6.4k views

ADD COMMENT • link 9.5 years ago by midox ▴ 290

1

Entering edit mode

I think your problem is not really paired-ends. You need to know the principle of double-strand DNA first. Do you know that even for single-end reads, the assembler effectively needs to reverse complement all reads in order to find matches?

ADD REPLY • link 9.5 years ago by lh3 33k

Ram · Answer 1 · 2015-10-23

0

Entering edit mode

9.5 years ago

Antonio R. Franco ★ 5.2k

No, it is not necessary. If you provide to the assembler in the command line the information of paired ends, it will work on consequence.

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by Antonio R. Franco ★ 5.2k

Ram · Answer 2 · 2015-10-23

Two files foward.fq and reverse.fq - are the artifacts of how the sequencing technology works. Most often (although this might not be true for some RNASeq libraries) you get 2 reads from each template you sequence, one in each file (one in forward.fq and one in reverse.fq). Compared to the template you sequenced, forward read will have exactly the same sequence, whereas reverse read will be reverse complement of your template. All genome assemblers know that and transcriptome assemblers will often ask you to confirm that your library is in FR orientation. This is all relative to your template. BUT! The DNA is double stranded, which means that you won't only sequence the above mentioned template, you will likely also sequence its second strand which is reverse complement to it.

Therefore, when you have a read, you don't really know from which strand it comes from (unless it's RNASeq and you did strand-specific protocol when preparing the library in the lab). Therefore, assemblers will check both options when looking for an overlap. That being said, when you are doing de-novo assembly, you have no way of knowing which of your contigs are in the same strand as reference and which are reverse complemented.

Ram · Answer 3 · 2015-10-23

0

Entering edit mode

9.5 years ago

midox ▴ 290

Thank you for your answer.

But that's not my question. I know that the input assembler paired end reads.

In the single end, it was a single file and do the overlap between the two files. it's simple.

but here we have two paired end file.

I want to know how the overlap between the paired end reads?

the overlap is in the foward.fq or the reverse.fq?

For assemblers, I know we can make entering the paired end and he can do the job but I'm looking how to make this work.

thanks

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by midox ▴ 290

0

Entering edit mode

You should put this as a comment under my post, not as an answer, because it's not one. You may also edit your original question to ask your question better. Now I have feeling you may be looking for something that FLASH does: http://ccb.jhu.edu/software/FLASH/

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by Biomonika (Noolean) 3.2k

1

Entering edit mode

the tools of merging the paired end reads is a solution of one part of the problem.

you know in the paired end reads there are types of paired end:

r1.1 ---->...<---- r1.2
r2.1 --<-->-- r2.2
r3.1 ----><---- r3.2

tools for merging it reads only apply for the second case.

in the second case one can create a consensus read then the assembly can be located in the overlap between the consensus.

But in the first case and the third case how?

it is a problem that no one could answer.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by midox ▴ 290

0

Entering edit mode

BBMerge can merge reads that are overlapping. It can also, as in your example 1 and 3, merge reads that are nonoverlapping, if you have sufficient coverage to build a kmer bridge between them.

Usage:

bbmerge-auto.sh in1=r1.fq in2=r2.fq out=merged.fq outu=unmerged.fq extend2=20 iterations=10 k=50 ecct qtrim2=r trimq=12

That will attempt to merge the reads based on overlap. If it fails, it will trim the reads to Q12 and try again. If that fails, it will error-correct the reads (using 50-mers) and try again. If that fails, it will try to extend the reads by up to 20bp each by assembling with 50-mers, then try again after each iteration, for up to 10 iterations.If it still fails, all of the changes will be reverted back and the original reads will be sent to "outu".

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you for your answer.

it is a solution that can solve the problem.

But in your experience, is there any other way to do assembly foward.fq and reverse.fq?

or it must always be a merging?

thankyou

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by midox ▴ 290

0

Entering edit mode

There are two approaches for merging reads. One is the overlapping that everybody knows. The other one is by using the Bruijn graphs, that seems to be more frequently used for nowdays assemblers. A google search can do a lot for you

ADD REPLY • link 9.5 years ago by Antonio R. Franco ★ 5.2k

0

Entering edit mode

Yes I know the approach De Bruijn Graph.

But I wanted to know is what to do the merging in all cases?

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by midox ▴ 290