Entering edit mode
2.0 years ago
mickalideh
•
0
Hi all, here is the issue:
I have set of Pacbio CCS long reads, say 7kb long each, that for the most part totally overlap with each other. Therefore if one were to assemble them, the assembly would not be much longer than the reads themselves.
I believe CANU has a little trouble when this is a case.
Does anyone know a good assembly algorithm for this case?
The question could be reframed as: What is a good algorithm for getting a consensus long read based on an input of mostly overlapping long reads?
If the reads are almost as long as the assembly then why not try a multiple/progressive sequence alignment.
I am not familiar with this technique, but I do not see how it would lead to the desired result which is a contig that is more accurate than any of the reads individually.
So basically you want a Consensus sequence of Consensus sequences? Seems redundant; are you sure you really need assembly at this point?
What kind of data is this, amplicon? plasmid? is it circular?
If you find you still want to assemble, I would start out with hifiasm: https://github.com/chhylp123/hifiasm
These are pacbio hifi reads from human. I am reasonably confident that they call from the same place on the same chromosomal copy.
Each read has a few random errors so the purpose of the assembly is to take the consensus of these reads and thus eliminate the random errors.
Thank you for the recommendation of hifiasm.