I want to detect modifications on a run of human DNA without amplification. The sequencing output was fast5 but I'm using dorado, so I first convert the reads to pod5. I then run dorado basecaller with the following command, it is using the model dna_r9.4.1_e8_hac@v3.3 and dna_r9.4.1_e8_hac@v3.3_5mCG_5hmCG@v0:
dorado basecaller hac,5mCG_5hmCG run_16_wo_WGA_pod5/ -r --min-qscore 7 -x > basecall_modifications/bc_output.bam
If I convert that to SAM I can already tell that there are no reads with the tags MM, MN or ML. Anyway, I run modkit on the BAM file with the command:
modkit pileup ../run_16_aligned.bam pileup.bed --log-filepath pileup.log
And I get a bunch of
[src/mod_bam.rs::183][2024-03-21 11:53:55][ERROR] failed to get modbase info for record 3b3459f9-d842-4b67-a224-5fb0fc8dabd9, Skipped: AUX data not found
And a final line with
[src/pileup/subcommand.rs::783][2024-03-21 12:01:08][INFO] Done, processed 0 rows. Processed ~0 reads and skipped ~443600 reads.
The bed file is obviously empty.
Is there anything wrong with this pipeline or is my data wrong in any way?
Was the methylation detection turned on in sequencer software when the run was done? If it was not then that may be the problem.
From what I know you don't need to change anything while sequencing, it just changes when basecalling, but maybe I'm wrong.
this is incorrect. at run time, you may specify epigenetic mod calling or not.
probably what you are referring to is the idea that dorado includes ways to account for modified bases in its model no matter what, which is true. this is done in order to avoid base calling errors if modified bases are present in the sample..however it in no way implies that modified bases will be called as such in your sample if you fail to specify that you would like that to be done.
Ok, in that case that's the problem, thank you!
yw. it seems weird the first time around because if it needs to process them anyway, why not just report the data? but there are various answers to that question, like "im not confident my methylation data will mean anything, because I treated my animals a certain way and im not asking that question" or some such.
in that case you might only want to avoid base call errors, but not want to call the variants.
I've looked into it and I can't seem to find any information about this, we also looked at the minKNOW software in the minION but we don't find the option to include modified bases. How do we do that?
Are you using the latest MinKNOW available?
We updated it just before checking
Hi,
I would post this issue in the issues session of the dorado github page: https://github.com/nanoporetech/dorado/issues
Didi you try any of the solutions posted at this link: https://github.com/nanoporetech/dorado/issues/671
I think I'm misunderstanding something. This process would have two steps, first sequence which generates pod5 files and then basecalling which generates fastq/bam as we are basecalling after the run was completed. I basecalled using the options to include modified bases basecalling but I didn't find anything. What you are saying in this comment is that I have to tell the sequencer to detect modified bases in the first or second step?
From what I understand with GridION MinKNOW has an explicit toggle to call methylation. I don't know if the software behaves differently with minION.