I want to study methylation in some samples sequenced with nanopore. I performed basecalling with nanopore with the model 5mCG_5hmCG and then generated the bedmethyl file with modkit. I looked at the ENCODE description of the format but there are things that I don't understand and also that I have in my bed file that are not described in the ENCODE project. These are the first few rows of the file:
NC_000001.11 10220 10221 h 1 + 10220 10221 255,0,0 1 0.00 0 0 1 0 0 0 0
NC_000001.11 10220 10221 m 1 + 10220 10221 255,0,0 1 100.00 1 0 0 0 0 0 0
NC_000001.11 10232 10233 h 1 + 10232 10233 255,0,0 1 0.00 0 0 1 0 0 0 0
NC_000001.11 10232 10233 m 1 + 10232 10233 255,0,0 1 100.00 1 0 0 0 0 0 0
NC_000001.11 10468 10469 h 1 + 10468 10469 255,0,0 1 0.00 0 1 0 0 0 0 0
NC_000001.11 10468 10469 m 1 + 10468 10469 255,0,0 1 0.00 0 1 0 0 0 0 0
NC_000001.11 10469 10470 h 1 - 10469 10470 255,0,0 1 0.00 0 0 1 0 0 0 0
NC_000001.11 10469 10470 m 1 - 10469 10470 255,0,0 1 100.00 1 0 0 0 0 0 0
I assume, the forth column (Name of item) the m means 5mCG and h means 5hmCG but I want to confirm this. So, as you can see, all the entries are duplicated of h and m, from what I understand, and looking at column 11 (Percentage of reads that show methylation at this position in the genome), it annotates all the positions and instead of saving if it is 5mCG or 5hmCG, it puts 100% to the one present. But this last column has some 0s and 1s after the percentage which I have no idea what they are.
If someone could explain in more detail what these mean it would be very helpful. Also, once I have the bedmethyl, is there an easy way with an already done tool to analyse it? Or do I have to extract the regions I'm interested in and see if they are methylated or what?
Thank you.
PD. Sorry for the weird formating of the modkit output, I can't get the lines to have just one line break.
Have you looked at this page that describes the file format:
https://nanoporetech.github.io/modkit/intro_bedmethyl.htmlThis link no longer works. See my comment below for new links.
It looks like theGithub link no longer works. I've looked on the modkit github page directly, but I'm not sure I can find the same documentation. Do you happen to remember what this file contained previously?
Bedmethyl format is described here: https://nanoporetech.github.io/modkit/intro_pileup.html#description-of-bedmethyl-output
Also described at Encode site: https://www.encodeproject.org/data-standards/wgbs/
Please use
101010
button to format text ascode
when you want to show monospaced data.Hello! I am trying to produce a BED methyl file using nanopore data as well. I am starting with 5mCG_5hmCG.bam files, and I have been getting stuck somewhere in my pipeline to produce the bed methyl files.
Might you be willing to share your process/pipeline to produce these bed methyl files?
thank you!
if you have a BAM file you can run a single modkit command. example here https://github.com/nanoporetech/modkit?tab=readme-ov-file#constructing-bedmethyl-tables
it is a pretty fast and easy compared to other difficult-to-use bioinfo tools :)
the file format that it outputs is described in the README there also
thank you! Presumably this single modkit command works if the BAM file is already aligned, correct? This cannot work for unaligned BAM files?
yes you'll likely want aligned BAM files. i haven't used unaligned bam a lot but if you have it can check e.g. https://lh3.github.io/2021/07/06/remapping-an-aligned-bam it shows how to potentially preserve some tags like the MM and ML when creating the aligned bam with the "samtools fastq -OT" method
Follow the directions from here: https://nanoporetech.github.io/modkit/intro_pileup.html
You will need an aligned BAM file which preserves the MM/ML tags. If you are starting with unaligned BAM files containing MM/ML calls then you will need to do
Amazing! thank you both GenoMax and @cmdcoli for your suggestions and links!