What is a good algorithm for assembly long reads where the intended assembly length is approximately equal to the reads themselves?
0
0
Entering edit mode
2.0 years ago
mickalideh • 0

Hi all, here is the issue:

I have set of Pacbio CCS long reads, say 7kb long each, that for the most part totally overlap with each other. Therefore if one were to assemble them, the assembly would not be much longer than the reads themselves.

I believe CANU has a little trouble when this is a case.

Does anyone know a good assembly algorithm for this case?

The question could be reframed as: What is a good algorithm for getting a consensus long read based on an input of mostly overlapping long reads?

pacbio assembly longread • 1.4k views
ADD COMMENT
0
Entering edit mode

If the reads are almost as long as the assembly then why not try a multiple/progressive sequence alignment.

ADD REPLY
0
Entering edit mode

I am not familiar with this technique, but I do not see how it would lead to the desired result which is a contig that is more accurate than any of the reads individually.

ADD REPLY
0
Entering edit mode

So basically you want a Consensus sequence of Consensus sequences? Seems redundant; are you sure you really need assembly at this point?

What kind of data is this, amplicon? plasmid? is it circular?

If you find you still want to assemble, I would start out with hifiasm: https://github.com/chhylp123/hifiasm

ADD REPLY
0
Entering edit mode

These are pacbio hifi reads from human. I am reasonably confident that they call from the same place on the same chromosomal copy.

Each read has a few random errors so the purpose of the assembly is to take the consensus of these reads and thus eliminate the random errors.

Thank you for the recommendation of hifiasm.

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6