Hello everyone,
I am using modkit to analyse the results from Dorado m6A_DRACH methylation base-calling.
(1) I have generated the bedmethyl file from bam file. Now i need a filter criteria for "coverage" and "mod_rate" to get rid of noisy predictions.
Can we directly use the filter on column "Nvalid_cov" as >=20 reads? or do we need to normalise it for per million reads?
(2) for Differential methylation analysis between conditions i am using dmr pair, following command
modkit dmr pair -a c6_r1.bed.gz -a c6_r2.bed.gz -a c6_r3.bed.gz -b dr6_r1.bed.gz -b dr6_r2.bed.gz -b dr6_r3.bed.gz -o dmr_result --ref Genome.fa --base A --threads 96 --log-filepath dmr_result.log
- How does modkit make the unified list of sites from both conditions with replicates?
- How does modkit tools handle the sites which are present in one condition and not in another?
- What kind of test modkit applies to get the DMR sites?
Thanks
Looks like you already opened an issue on
modkit
GitHub. That is probably the best place to ask this: https://github.com/nanoporetech/modkit/issues/364If you get an answer there please come back to this thread and post it here.
Hi! Why did you want to make a 20 reads threashold? (Nvalid_cov" as >=20 reads) Are you making any filtering for percent modified ((Nmod / Nvalid_cov) 100) or Nmod?
hello,
(1) The 20 reads threshold was used to filter out sites which are supported by very few reads and may be accounted as noise. So, in order to have a coverage filter i selected normalise coverage instead of raw read coverage. I divided the raw reads by the total number of reads mapped in that particular sample and multiplied by 1million. Then took a cutoff of 2 (reads/million mapped reads). normalisation was done as the samples were different in sizes (no. of reads), so a normalised read count removes any bias for library size.
(2) Yes, i also took a filter cutoff of 5% for a site to be called modified. Means Nmod>=5% (Nmod/Nvalid_cov).