Question

Should I use the basecaller Dorado to analyze my Nanopore Data?

1

Entering edit mode

15 months ago

kerianaleerivera ▴ 10

I am working with identifying bacterial communities in the fruit fly gut. In October we sequenced the data using Nanopore Technology and the results showed that there were no families to a specific type of bacteria that we have studied in the lab for a while. I decided to re analyze the data myself, but I am confused as to what would be the correct way to analyze it. Apparently Nanopore has a new basecaller called Dorado, I have converted my fast5 files to Pod5 to be able to use this basecaller. I searched up info online and the output is supposed to be a .cram file.

Should I change the cram files to fasta or fastq and then import them to qiime2 for taxonomic classification or visualization or should I just ignore the basecaller and try to clean it using other tools and then importing it to qiime2?

To clean the data I would use the following:

Remove the adapters using PoreChop
Trim and remove the reads with NanoFilt
Filter sequences with fastp

Dorado Nanopore • 7.5k views

ADD COMMENT • link updated 15 months ago by cfos4698 ★ 1.1k • written 15 months ago by kerianaleerivera ▴ 10

0

Entering edit mode

You can just use guppy to do basecall and get fastq output, no need to use dorado.

ADD REPLY • link 15 months ago by MatthewP ★ 1.4k

score 2 · Answer 1 · 2023-12-14

2

Entering edit mode

15 months ago

GenoMax 150k

Dorado is now the preferred basecaller. Output of dorado is an unaligned BAM file through you can also get it to emit fastq data (if you prefer that).

ADD COMMENT • link 15 months ago by GenoMax 150k

1

Entering edit mode

As a side note if you're using one of the base modification models I've found it easier to have dorado do the alignment also. It lets you use the aligned bam as a direct input to modkit.

ADD REPLY • link 15 months ago by rpolicastro 13k

0

Entering edit mode

I will keep this in mind! Thank you. Also if I have the fastq files that were provided by nanopore, is it better for me to just clean the fastq files or do you recommend using the raw data and starting from the basecalling?

ADD REPLY • link 15 months ago by kerianaleerivera ▴ 10

1

Entering edit mode

Dorado is now the default caller for MinKNOW. If a recent version was used then it is possible that dorado was already used. In that case you can start working with fastq directly.

ADD REPLY • link 15 months ago by GenoMax 150k

0

Entering edit mode

Thank you for the reply! I am reading over the dorado documentation and do not see an option that woud let me emit it as fastq data. Do you perhaps have a link or the code that I can use?

ADD REPLY • link 15 months ago by kerianaleerivera ▴ 10

2

Entering edit mode

dorado basecaller --emit-fastq model_file POD5_folder > file.fastq

With v. 0.4.3. I see that 0.5.0 is out. This software is frequently updated.

ADD REPLY • link 15 months ago by GenoMax 150k

score 1 · Answer 2 · 2024-01-04

1

Entering edit mode

15 months ago

colindaven 7.3k

As your sequencing was done in October, you won't get much value out of re-basecalling. It makes sense to re-basecall if your data are much older, eg I got much better (~4% better pc identity) data re-basecalling 2021 data in 2023 with the appropriate model.

But re-basecalling 3 months later likely makes little sense.

I would focus my efforts on using other metagenomic long read binners with different databases behind them to find your bacteria of interest, if present.

Is this 16S or whole metagenome data ? For 16S try https://gitlab.com/treangenlab/emu

ADD COMMENT • link 15 months ago by colindaven 7.3k

0

Entering edit mode

The "appropriate model" part is an important point. It'd be worth checking whether the reads were basecalled with a HAC or SUP model vs just a fast model.

ADD REPLY • link 15 months ago by cfos4698 ★ 1.1k