Hello
I am currently trying to compare simplex base calling with duplex base calling using Dorado. I am using raw binary outputs from the Nanopore sequencer (.pod5 files) with R10.4 flow cell.
For each of those files, I do simplex base calling using the last hac model, and duplex base calling with the latest sup model. However, I end up with more reads after duplex base calling than simplex base calling for the same pod5 file used in input.
- Is this supposed to happen ? And if so, why ? This may be because of the different model used (hac compared to sup), but it is just a guess.
I would've thought that the "duplex reads" that you get with duplex base calling would be included in the ones in simplex base calling but higher quality since the Nanopore reads both strand of the DNA to produce an output.
Further more I do not understand what the different tags for reads in the bam file avec base calling on the Github page:
The dx tag in the BAM record for each read can be used to distinguish between simplex and duplex reads:
-dx:i:1 for duplex reads
-dx:i:0 for simplex reads which don't have duplex offsprings.
-dx:i:-1 for simplex reads which have duplex offsprings.
For me, dx:i:1 are the reads base called using the sequence of both strands of the DNA. dx:i:0 are the reads base called only using one strand of the DNA as the complementary strand one did not go through the Nanopore.
- It is the dx:i:-1 tag that I don't understand, are those reads the two strands of DNA sequenced to produce a duplex read ? Should I keep all those reads to continue my analysis of Nanopore data or should I delete some as there may be some reads that are counted multiple times ?
I have compared the amount of the different tags in my bam file to see if these numbers line up with the different amounts of reads with simplex/duplex outputs but they don't.
I am just having trouble understanding this part of the documentation and why I have more reads when using duplex instead of simplex. If anyone could enlighten me I would greatly appreciate it.
Thank you
I assume you have posted this in nanopore community and/or on the dorado issues page on GitHub. Please come back and post an update once you get an answer.