Hi reader,
I have some nanopore sequenced fungal sample. Sequenced with R10.4 cell and LSK114 kit.
Some initial Information
- Median Length average: 8055bp
- N50 on average: 11109
- Meadian Quality average: 16.7
Question 1: I am not sure about the term Error-Rate in sequencing. Can you guide or point me to direction where I can read about it and calculate it in my data sets ?
The Sequencing company already basecalled the data with Dorado
and provided fastQ files as pass/fail
. I want to generate assemblies so using fastQ-Pass
files for downstream analysis.
I have used Porechop
to remove adapters and Chopper
to remove reads with Q<10 and Length<2000.
I am using Flye v-2.9.5-b1801
for genome assemblies. I have question regarding selecting the read-type
option.
Question 2: out of these two which one is more suitable for my data .
--nano-raw path [path ...]
ONT regular reads, pre-Guppy5 (<20% error)
--nano-hq path [path ...]
ONT high-quality reads: Guppy5+ SUP or Q20 (<5% error)
Also, Is the --scaffold
option in Flye assembler recomended to use ?
Thank you.
Thank you for clerification.
based on my Raw data i get error rates around 2–2.2%. Shold i use
--nano-hq
or--nano-raw
.what do you suggest based on your experience ?
If data has been basecalled with
dorado
using super accuracy (SUP) or high accuracy (HAC) models then use the--nano-hq
.Hi. What if i don't know which dorado model has been used ? Is there any way to find that out from fast5 fastQ or the summary file created by of each sample.
Fastq file header should contain the model used for basecalling. I see that in files I work with that are rebasecalled. Q score distribution should be better than Q20 (on avg), if the data is high or super accuracy.
Thank you. I found it to be
basecall_model_version_id=dna_r10.4.1_e8.2_400bps_hac@v4.2.0
but my Q-score is Q16 on average. I have already generated assemblies using--nano-raw
and polished them but now i am confused if i had to use--nano-hq