Dear Fellows,
Recently, I started to use nanopore for sequencing, and use canu for the de novo assembly. After reading the manual and conduct some test run, I am curious about canu's correction process of raw nanopore data.
By default, the canu only correct the longest reads, up to 40x of coverage, and the corrected reads are generated as a consensus of other reads. So I just wonder how the consensus generate, from the overlap information of all the raw reads, or the overlap information of only those longest, 40x-coverage reads? As far as I guess, I think it is the overlap information of all the raw reads that are used for the correction of those selected, longest reads. But I hope some senior fellows can give me a real answer. Additionally, since indel is common in nanopore reads, it is reasonable to think that there would be some gaps in the overlap column. In this situation, how the correction processes the gap in each column? gap would also be included in the consensus generation and even lead to base deletion in the corrected reads, or just ignore those gap characters and only use ATCG to generate the consesus and then correct the reads?
Since I am still a freshman in this area, I really hope the you experts can help me about such stupid questions, many thanks for your kindness.
Thanks.