Hi all,
I am trying to decode the SAM MM and ML tags read by read and create a table of methylation locations on that read. For example, from an aligned sam file:
Nanopore_Sequence_Example 4 * 0 0 * * 0 0
GTTATGTAACCTACTTGGTTCCATTACGTATTGCTGGTGCTGAAGATTGTAGGTGTCTTTGTGCAGAGTGTATGATATACACGGCGGTGCTGAAGAAAGTTATTGCGGGTGTATTTGTGCAGAAGTATATGATGTGCGCGGGCGGAGGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGCAGAAGTATATGATGGCGAGGTGTTGAAGAAAGTTGTCGGTGTCTTTGTGCAGAAGTATATGATGTGCGCGGGCGGATCCGCCCGCGCATCCTTCTGCGCAAT
"&+*+*)$#$%'&&')%&'(,)))*1555:>B@7777BBCD10.*.%%&)$$$*14//0.,.-..'(%(%&''..211--'$$%&+))56<;;998:44892.-,+)('&*)'&&((,++/0064:385566;>A>=6@<@AA?;:::=>>?==7=?@=<>>;BA@??@?=:;;;7011+,,).-,++++-&$$%)(,)*,)$%'(((&&'(&&%%%&+0///20877656??BAA@@ABBEFGC>=57793222532110,50-++$$$%&),,,+())
rl:i:0
MM:Z:C+h?,5,5,0,1,1,0,0,1,2,0,2,0,0,1,2,0,4;C+m?,5,5,0,1,1,0,0,1,2,0,2,0,0,1,2,0,4;
ML:B:C,159,6,135,2,7,9,3,4,13,11,6,22,6,1,2,2,218,0,4,19,1,2,7,4,1,0,1,4,15,11,2,1,0,0
I'm trying to create a table/list that would look like this:
Read: Nanopore_Sequence_Example
Methylation position 1: 5mC Methylated
Methylation position 2: Unmethylated
Methylation position 3: 5hmC Methylated
I am new to methylation calling, any help would be amazing.
I made changes to how I decode m6A events and updated links in the answer above. Python code is in a separate repository. Javascript code is a bit more fleshed out. Hopefully this is useful for others.
Seems pysam(0.22.0).modified_bases doesn't consider CIGAR though.
That's correct, the second step is to process the CIGAR string to further decode bases to the correct location.
You may try modkit program from Nanopore to get the indel considered. A powerful tool.