Hello,
I am new at processing Nanopore sequencing data and am having an issue:
I have binary fast5 files directly out of the Nanopore sequencer (R9 Flowcell) and I would like to use Dorado to perform basecalling as it seems to be the preferred tool. I have already used dorado to do simplex basecalling on pod5 files using argument "hac" for the model as it says on the nanoporetech/dorado GitHub page.
However, I can't seem to make it work on fast5 files even though the documentation says that it is supported for simplex basecalling (though less performant). This is what I type in my terminal:
$ dorado basecaller hac /directory/to/my/fast5/files --emit-fastq > output.fastq
I keep getting this error:
[error] Cannot automate model selection using fast5 files
I have tried using "fast" or "sup" instead of "hac" in case it would make a difference but to no avail.
Is there a specific model I should use or download ? Any other tools you could recommend for basecalling from fast5 files ? I know about guppy however I am unable to download it as it is an ONT tool.
Any help would be greatly appreciated.
Thanks,
Lele
Thank you for your reply. I have just tried this as well as using hac@v4.2.0 however I still get the same error... Guess I'll have to figure out another way.
Have you tried specifying exactly the model ? For instance
dna_r9.4.1_e8_hac@v3.3
for R9 flowcell.Just replaced hac by the full name of one of the downloaded model and it works. Silly mistake on my part as hac@latest still automatically chooses a model for you as it says in the model complex table.
Thanks again !
Would that have changed lately?
I was chatting with one of the developers of dorado recently ( in the frame of a bug report) and he told me that the model name start at 'hac' 'fast' or 'sup' and that all the text before it should be omitted. If not dorado assumes you provide a path to a model and will fail. (alternatively you can provide the full path indeed to the model, and then you need to add the dna_..._ part in the name)
They acknowledge themselves it is indeed a bit confusing ;)
(this is for dorado from at least v0.8 onwards)
If you have access to internet then
dorado
will automatically download the correct model. You only need to specify the level of calling ashac/sup
etc.that indeed works as well but this is in the case where you (for some reason) would like to use a specific model (and/or version of it)
interesting point though: would you assume that most (all?) always use the latest most recent model? if so then indeed the system where you just ask 'hac' 'sup' or 'fast' is likely the easiest (and would make the keeping-it-up-to-date work a lot smoother :) )
if you are running it on HPC systems without internet access it's a different ballgame of course
Main point is you want to make sure you use the right pore version. That seems to automagically happen if you have internet access.
True indeed. but if you pre-download all models, dorado will also do that without internet access.
Well, it will always do that without internet access (it's in the pod/fast files itself) but downloading the models , if you don't have them already, without internet access will be difficult of course :)