Hi,
I am currently developing a workflow that takes raw fast5 files from oxford nanopore sequencing and basecalls them first, then using these basecalls, it re squiggles the output based on a reference genome. Previous workflows for this utilized albacore for basecalling followed by tombo for the re-squiggle. I am now using the updated basecaller provided by ONT (dorado) and I am confused on how to take the basecalls produced by dorado and use them in another software called remora. I am using remora because it is the successor to Tombo (the previous software used to do the resquiggling). How could I follow a similar workflow as using albacore followed by tombo, but using the two newer softwares dorado and remora.
Which part is confusing?
dorado
can output fastq calls which I assume you can use forremora
. Whiledorado
can use fast5 files it is much faster when fast5 files are converted to POD5 format.Actually
dorado
methylation call BAM files can directly go intomodkit
(LINK) from what Bob Policastro was saying.For remora, you need to use POD5 file and a BAM. Its unclear in their github on how to just re-squiggle the bases with a reference genome. There is no command for re-squiggle as there was for Tombo which is why I am confused about this part...
also I am not doing this for methylation, just want to simply call all of the bases, then re-squiggle the output
But why do you want to re-squiggle?
to correct for variation. First step is just to get the reads from the raw current, second is to re-squiggle those basecalled reads using a reference genome.
Are you following some very old pipeline to detect nucleotide modifications?
no, I am not detecting methylation or any nucleotide modifications. When basecalled, raw reads can be called incorrectly. Re-squiggling algorithms use a reference genome and the raw current to correct for bases that were basecalled incorrectly
Things have changed significantly since the early days of ONT. WouterDeCoster may have specific comment but this may no longer be required. If you have historical data you could compare it with current dorado calls.
Yes I don't think this is necessary at all (or ever was).
It is actually a part of ONT preprocessing pipelines. It still exists actually but the term "re-squiggle" has been renamed to "signal mapping refinement". I am not sure where you are getting your information from, but from what I have been doing I do believe this is still an important step to achieve what I wish to achieve. Thanks