Does any one know whether a assembler can get the accurate assembly of the two ends of the sequence?
Thanks!
Sorry for not making myself clear. Actually, I want to know how the assemblers deal with the end of sequence. How do they know they have reached the end of the sequence while doing assembly? Can they give out accurate end of the sequence?
Why would you need an assembler if you are referring to stitching together R1/R2 reads from a fragment, if they overlap in the middle. bbmerge.sh from BBMap suite or FLASH are appropriate tools to do that read merge.
If you are referring to assembling two ends of a contig to get a closed circle then that is a different question. Please add some additional explanation by editing this post.
I edited this post. Actually, I sequenced the whole genome of a virus. I want to know whether de novo assembly can give out accurate sequence at the end of the genome.
Tadpole seems to do a very good job of viral assembly, so I recommend it for that purpose. Your post's title is misleading, though:
"Assembly of two ends of the sequence"
As far as I can tell, this has nothing to do with your goal, so I suggest you change it. Tadpole can typically generate a good viral assembly from shotgun reads, but your title is completely unrelated to this task.
Edit - I misinterpreted your question. It appears that you want to assemble the outermost ends of your genome... I interpreted as being related to read pairs. Anyway, Tadpole has no problem assembling the ends of viral contigs, as long as they are concordant.
While the ends part is true if OP is willing to try tadpole then it would not hurt to do a second assembly with BBtools. Who knows OP may get a better assembly.
BTW, I was wondering whether anyone can tell me how the assemblers deal with the end of sequence. How do they know they have reached the end of the sequence and can they give out accurate end sequence?
Have you benchmarked this against other read-merging programs, to see how well it performs, in terms of true-positive and false-positive merges? Or, in terms of speed? From reading the code, it does not appear that it would perform very well by either metric...
Why would you need an assembler if you are referring to stitching together R1/R2 reads from a fragment, if they overlap in the middle.
bbmerge.sh
from BBMap suite or FLASH are appropriate tools to do that read merge.If you are referring to assembling two ends of a contig to get a closed circle then that is a different question. Please add some additional explanation by editing this post.
I edited this post. Actually, I sequenced the whole genome of a virus. I want to know whether de novo assembly can give out accurate sequence at the end of the genome.
Besides, the two ends of the genome have relative low coverage than the middle part.
You may want to take a look at
tadpole.sh
from BBMap suite for doing the assembly (denovo-assembly with novel genes inserted ).Tagging: Brian Bushnell
Brian can provide additional insight.
Tadpole seems to do a very good job of viral assembly, so I recommend it for that purpose. Your post's title is misleading, though:
"Assembly of two ends of the sequence"
As far as I can tell, this has nothing to do with your goal, so I suggest you change it. Tadpole can typically generate a good viral assembly from shotgun reads, but your title is completely unrelated to this task.
Edit - I misinterpreted your question. It appears that you want to assemble the outermost ends of your genome... I interpreted as being related to read pairs. Anyway, Tadpole has no problem assembling the ends of viral contigs, as long as they are concordant.
While the ends part is true if OP is willing to try
tadpole
then it would not hurt to do a second assembly with BBtools. Who knows OP may get a better assembly.please provide more details regarding your data.