Entering edit mode
4 weeks ago
Jason
•
0
I now have illumina sequencing data and nanopore sequencing data of a mutant strain of Corynebacterium glutamicum, as well as its wild-type reference high quality whole genome.
Due to poor DNA quality of mutant strain, the N50 of the nanopore data is very low.
How can I obtain a circle plot (high quality whole genome of mutant) based on above data? which software or workflow you recommend?
Thanks
Description of ONT data:
General summary:
Mean read length: 566.4
Mean read quality: 9.3
Median read length: 435.0
Median read quality: 9.2
Number of reads: 169,528.0
Read length N50: 635.0
STDEV read length: 1,933.0
Total bases: 96,018,294.0
Number, percentage and megabases of reads above quality cutoffs
>Q5: 169523 (100.0%) 96.0Mb
>Q7: 168539 (99.4%) 95.9Mb
>Q10: 39622 (23.4%) 30.7Mb
>Q12: 1153 (0.7%) 1.2Mb
>Q15: 6 (0.0%) 0.2Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 21.9 (38688)
2: 19.7 (73373)
3: 19.5 (16502)
4: 17.4 (21180)
5: 16.0 (1654)
Top 5 longest reads and their mean basecall quality score
1: 527400 (8.8)
2: 469123 (8.5)
3: 107070 (13.1)
4: 98992 (8.9)
5: 84108 (8.4)
So, in essence, you want software that turns bad-quality data into good-quality data?
Hi Michael, thanks for your reply. I want to use poor ONT data + illumina data to assemble a whole genome sequence of mutant strain. Luckily, I have a whole reference genome of wild type strain. So, I was wondering if I can combine sequenced data and reference genome data to obtain a whole genome of mutant? Thanks :)
I think I am possibly missing the point here, but to me the obvious answer is to try to get high-quality data in the first place.
Btw, I have worked in the same group that made (one of) the first C. glutamicum reference genome (in 2003). C. glutamicum is easy to culture that's why it's used in biotechnological processes. With the current ONT sequencing, you can create a genome assembly that is on par with or better than the reference:
Possibly, the data is the result of a first sequencing attempt, but remember Q10 equals 10% error rate, combined with the shorter read lengths you are not getting much out of the data anyway. You won't get reliable SNVs or structural variation out of this, the reference won't help you much.
If there is no option to generate good-quality HMW DNA use only the Illumina sequences and run variant detection.
IF that cannot be done either, refuse.
If you just want to compare your genomes, you can use any of a list of alignment tools.
mauve
ofmummer
are great for this, but if you want to visualise with something likecircos
I have previously usedSatsuma2
, the output of which can easily be adapted incircos
input.But @michael's point is valid. You're not going to improve your assembly if you have poor input data. Reference guided approaches come with a whole series of issues - there are plenty of discussions about that on this site if you search.
Thanks! it quite helpful. The big problem to me is the DNA quality for sequencing. Because I only access to the DNA with industrial treatment, normally, the N50 is around 600-700 bp. So is it impossible to get high-quality whole genome in circle based one these data? Could I improve it through accumulating the quantity of data (multiple sequencing of different DNA of mutant)?