Hello,
in the case of an assembly paired end (two lines) we have foward.fq and reverse.fq.
Do we must reverse complement of file 2 (reverse.fq) for assembly or not? or we need to do a reverse of reverse.fq?
I know that there are assemblers who do this work but I want to know this principle.
Thank you
I think your problem is not really paired-ends. You need to know the principle of double-strand DNA first. Do you know that even for single-end reads, the assembler effectively needs to reverse complement all reads in order to find matches?
Two files foward.fq and reverse.fq - are the artifacts of how the sequencing technology works. Most often (although this might not be true for some RNASeq libraries) you get 2 reads from each template you sequence, one in each file (one in forward.fq and one in reverse.fq). Compared to the template you sequenced, forward read will have exactly the same sequence, whereas reverse read will be reverse complement of your template. All genome assemblers know that and transcriptome assemblers will often ask you to confirm that your library is in FR orientation. This is all relative to your template. BUT! The DNA is double stranded, which means that you won't only sequence the above mentioned template, you will likely also sequence its second strand which is reverse complement to it.
Therefore, when you have a read, you don't really know from which strand it comes from (unless it's RNASeq and you did strand-specific protocol when preparing the library in the lab). Therefore, assemblers will check both options when looking for an overlap. That being said, when you are doing de-novo assembly, you have no way of knowing which of your contigs are in the same strand as reference and which are reverse complemented.
But that's not my question. I know that the input assembler paired end reads.
In the single end, it was a single file and do the overlap between the two files. it's simple.
but here we have two paired end file.
I want to know how the overlap between the paired end reads?
the overlap is in the foward.fq or the reverse.fq?
For assemblers, I know we can make entering the paired end and he can do the job but I'm looking how to make this work.
thanks
ADD COMMENT
• link
updated 5.0 years ago by
Ram
44k
•
written 9.1 years ago by
midox
▴
290
0
Entering edit mode
You should put this as a comment under my post, not as an answer, because it's not one. You may also edit your original question to ask your question better. Now I have feeling you may be looking for something that FLASH does: http://ccb.jhu.edu/software/FLASH/
the tools of merging the paired end reads is a solution of one part of the problem.
you know in the paired end reads there are types of paired end:
r1.1 ---->...<---- r1.2
r2.1 --<-->-- r2.2
r3.1 ----><---- r3.2
tools for merging it reads only apply for the second case.
in the second case one can create a consensus read then the assembly can be located in the overlap between the consensus.
But in the first case and the third case how?
it is a problem that no one could answer.
ADD REPLY
• link
updated 5.0 years ago by
Ram
44k
•
written 9.1 years ago by
midox
▴
290
0
Entering edit mode
BBMerge can merge reads that are overlapping. It can also, as in your example 1 and 3, merge reads that are nonoverlapping, if you have sufficient coverage to build a kmer bridge between them.
That will attempt to merge the reads based on overlap. If it fails, it will trim the reads to Q12 and try again. If that fails, it will error-correct the reads (using 50-mers) and try again. If that fails, it will try to extend the reads by up to 20bp each by assembling with 50-mers, then try again after each iteration, for up to 10 iterations.If it still fails, all of the changes will be reverted back and the original reads will be sent to "outu".
But in your experience, is there any other way to do assembly foward.fq and reverse.fq?
or it must always be a merging?
thankyou
ADD REPLY
• link
updated 5.0 years ago by
Ram
44k
•
written 9.1 years ago by
midox
▴
290
0
Entering edit mode
There are two approaches for merging reads. One is the overlapping that everybody knows. The other one is by using the Bruijn graphs, that seems to be more frequently used for nowdays assemblers. A google search can do a lot for you
I think your problem is not really paired-ends. You need to know the principle of double-strand DNA first. Do you know that even for single-end reads, the assembler effectively needs to reverse complement all reads in order to find matches?