Entering edit mode
3.5 years ago
kamanovae
▴
100
Hi!
I have nanopra sequencing data that include 3.6GB of Fasta file. After long-read assembly via Canu and Flya was resulted very different number of contings. There are 1132 coatings (22Mb) for Canu and 8677 contings (600Mb) for Flye. The genome size of the sample I am interested in is approximately 600 Megabase.
I'm trying to figure out why the results are so different. Thank you for your help
Likely the coverage was too low - assuming you really have 3.6 GB of FASTA reads only (why not FASTQ ? Are you sure it's not zipped ?. 3600 MB / 600 MB = 6X coverage ( not much!).
Most assemblers will want 30-60 X + coverage to perform well.
Have a look at Canu's website on Github etc, they have settings for low coverage data.
Good luck!