I want to study methylation in some samples sequenced with nanopore. I performed basecalling with nanopore with the model 5mCG_5hmCG and then generated the bedmethyl file with modkit. I looked at the ENCODE description of the format but there are things that I don't understand and also that I have in my bed file that are not described in the ENCODE project. These are the first few rows of the file:
NC_000001.11 10220 10221 h 1 + 10220 10221 255,0,0 1 0.00 0 0 1 0 0 0 0
NC_000001.11 10220 10221 m 1 + 10220 10221 255,0,0 1 100.00 1 0 0 0 0 0 0
NC_000001.11 10232 10233 h 1 + 10232 10233 255,0,0 1 0.00 0 0 1 0 0 0 0
NC_000001.11 10232 10233 m 1 + 10232 10233 255,0,0 1 100.00 1 0 0 0 0 0 0
NC_000001.11 10468 10469 h 1 + 10468 10469 255,0,0 1 0.00 0 1 0 0 0 0 0
NC_000001.11 10468 10469 m 1 + 10468 10469 255,0,0 1 0.00 0 1 0 0 0 0 0
NC_000001.11 10469 10470 h 1 - 10469 10470 255,0,0 1 0.00 0 0 1 0 0 0 0
NC_000001.11 10469 10470 m 1 - 10469 10470 255,0,0 1 100.00 1 0 0 0 0 0 0
I assume, the forth column (Name of item) the m means 5mCG and h means 5hmCG but I want to confirm this. So, as you can see, all the entries are duplicated of h and m, from what I understand, and looking at column 11 (Percentage of reads that show methylation at this position in the genome), it annotates all the positions and instead of saving if it is 5mCG or 5hmCG, it puts 100% to the one present. But this last column has some 0s and 1s after the percentage which I have no idea what they are.
If someone could explain in more detail what these mean it would be very helpful. Also, once I have the bedmethyl, is there an easy way with an already done tool to analyse it? Or do I have to extract the regions I'm interested in and see if they are methylated or what?
Thank you.
PD. Sorry for the weird formating of the modkit output, I can't get the lines to have just one line break.
Have you looked at this page that describes the file format: https://nanoporetech.github.io/modkit/intro_bedmethyl.html
Please use
101010
button to format text ascode
when you want to show monospaced data.