Low basecalling of raw ONT data
1
0
Entering edit mode
2.1 years ago
c.eskiw ▴ 10

Good day,

Just wondering what the normal efficiency is for basecalling on ONT? I know it is inherently noisy but I am losing about half of my data with ~50% of the data ending up in the 'fail' folder. Are there any settings to increase this?

Here is the code I am using with the associated messages:

C:\Users\chris>"C:\Program Files\OxfordNanopore\ont-guppy-cpu\bin\guppy_basecaller.exe" --input_path F:\ProximaData\220916-CHE014\CHE014\20220916_1641_MN39611_FLO004_54e83c3b\fast5\ --save_path F:\ProximaData\220916-CHE014\CHE014\20220916_1641_MN39611_FLO004_54e83c3b\fast5\ONToutput -c dna_r9.4.1_450bps_fast.cfg --num_callers 7 --cpu_threads_per_caller 4
ONT Guppy basecalling software version 6.3.8+d9e0f648d, minimap2 version 2.22-r1101
config file:        C:\Program Files\OxfordNanopore\ont-guppy-cpu\data\dna_r9.4.1_450bps_fast.cfg
model file:         C:\Program Files\OxfordNanopore\ont-guppy-cpu\data\template_r9.4.1_450bps_fast.jsn
input path:         F:\ProximaData\220916-CHE014\CHE014\20220916_1641_MN39611_FLO004_54e83c3b\fast5\
save path:          F:\ProximaData\220916-CHE014\CHE014\20220916_1641_MN39611_FLO004_54e83c3b\fast5\ONToutput
chunk size:         2000
chunks per runner:  160
minimum qscore:     8
records per file:   4000
num basecallers:    7
cpu mode:           ON
threads per caller: 4

Use of this software is permitted solely under the terms of the end user license agreement (EULA).By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in C:\Program Files\OxfordNanopore\ont-guppy-cpu\bin
Found 258 input read files to process.
Init time: 56 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 9677368 ms, Samples called: 6817789498, samples/s: 704509
Finishing up any open output files.
Basecalling completed successfully.

Thanks in advance Chris

ONT basecalling • 1.7k views
ADD COMMENT
1
Entering edit mode

While ONT software puts the data into "fastq_fail" directories because it fails the "bar", you may want to look through there since there can still be a lot of usable data there. Especially if you have a good reference available.

ADD REPLY
0
Entering edit mode

The minimum qscore is the "bar" which can be modified during basecalling. The default which you can see in your guppy output is 8, so everything below that is put in the 'fail' directory.

ADD REPLY
0
Entering edit mode
2.1 years ago
  1. You seem to be using fast basecalling mode, this might affect the pass/fail status for each read. Can you recall on a system with GPU on your local cluster?
  2. What kind of data is this? Amplicon data might have lower pass rates than WGS (just an idea). Did sample prep go smoothly?
  3. I would expect >80% of reads to go into the pass folder generally
  4. ONT filtering workflows are not that clear in general compared to what is needed for short reads IMHO.
ADD COMMENT
0
Entering edit mode

Thanks for the response.

I am limited to the machine I am using in CPU mode (8 cores with 64GB of RAM). The data is WGS of yeast and as far as I know the prep went well. I was planning on aligning to S288C as a ref genome but I suspect the strains I am analyzing will have significant drift and divergence. As such I was also planning on de novo assembly as well as annotation. I am also open to any suggestions on linking assemblies from FLYE or SPAdes with liftoff or liftover.

All the best, Chris

ADD REPLY
0
Entering edit mode

I was planning on aligning to S288C as a ref genome but I suspect the strains I am analyzing will have significant drift and divergence.

In this case you would definitely need to run again the basecalling by using the high- or super-accuracy basecalling models

ADD REPLY

Login before adding your answer.

Traffic: 1837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6