Estimate error-rate and assembly oftion for FLYE assembler
1
0
Entering edit mode
4 weeks ago
Umer ▴ 160

Hi reader,

I have some nanopore sequenced fungal sample. Sequenced with R10.4 cell and LSK114 kit.

Some initial Information

  • Median Length average: 8055bp
  • N50 on average: 11109
  • Meadian Quality average: 16.7

Question 1: I am not sure about the term Error-Rate in sequencing. Can you guide or point me to direction where I can read about it and calculate it in my data sets ?

The Sequencing company already basecalled the data with Dorado and provided fastQ files as pass/fail. I want to generate assemblies so using fastQ-Pass files for downstream analysis.

I have used Porechop to remove adapters and Chopper to remove reads with Q<10 and Length<2000.

I am using Flye v-2.9.5-b1801 for genome assemblies. I have question regarding selecting the read-type option.

Question 2: out of these two which one is more suitable for my data .

--nano-raw path [path ...]
                        ONT regular reads, pre-Guppy5 (<20% error)
--nano-hq path [path ...]
                        ONT high-quality reads: Guppy5+ SUP or Q20 (<5% error)

Also, Is the --scaffold option in Flye assembler recomended to use ?

Thank you.

flye assembly genome error-rate nanopore • 470 views
ADD COMMENT
0
Entering edit mode
4 weeks ago

Error rate typically means a Phred like score

https://en.wikipedia.org/wiki/Phred_quality_score

where the error rate E is plugged into the formula

Error probability = 10^(-E/10)

so for example E=20 would be 10^-2 --> P = 1/100 = 0.01 that is 1% error, one error every hundred basecalls.

Sometimes people call it E or Q, sometimes it is shown as P (probability) as a fraction, and sometimes it is expressed as a percent, so it can be a bit confusing.

ADD COMMENT
0
Entering edit mode

Thank you for clerification.

based on my Raw data i get error rates around 2–2.2%. Shold i use --nano-hq or --nano-raw.

what do you suggest based on your experience ?

ADD REPLY
0
Entering edit mode

If data has been basecalled with dorado using super accuracy (SUP) or high accuracy (HAC) models then use the --nano-hq.

ADD REPLY
0
Entering edit mode

Hi. What if i don't know which dorado model has been used ? Is there any way to find that out from fast5 fastQ or the summary file created by of each sample.

ADD REPLY
0
Entering edit mode

Fastq file header should contain the model used for basecalling. I see that in files I work with that are rebasecalled. Q score distribution should be better than Q20 (on avg), if the data is high or super accuracy.

ADD REPLY
0
Entering edit mode

Thank you. I found it to be basecall_model_version_id=dna_r10.4.1_e8.2_400bps_hac@v4.2.0 but my Q-score is Q16 on average. I have already generated assemblies using --nano-raw and polished them but now i am confused if i had to use --nano-hq

ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6