Hi everyone !
I'm currently working on parsing SAM file to extract methyllation site from a nanopore sequencing. According to the SAM documentation, it's possible thanks to MM:Z and ML:B:C tags.
You can find bellow, a read, the MM:Z and ML:B:C tags extracted from my SAM file:
TGATCGCGCGGACCTGTTCTACCAGGTAGGTCACCGGGTCAAATGATATTTTGATGGTGTTGGACACCACCGTCTGGCTGGCGCTCAGGGTGCCGGAGTTCAGAGCGTAGATGAATGTCTCAAACGCGGAGGATTTCTCGCCTCCCAGCATGTAAATTGGCCACTGCAGGGCGCTGCTCTTGTCAGTATAGCGGAAATGTATGGGGAGCGGCATATTTCGTTAAGGACGGTTGCAATGGCTACCCCAGAATCTTGGCTGCTGTTGCCTTCGACCGCCGCGTTCACGCGCTCAATTGTGGGGTGGAGCACAGCGATCGCTGAAGCGGCGCACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACCACAAAGACACCGACAACTTTCTTCGAGCAAATTCACCTACGCCAGCAACTGAACGAAGTAC
MM:Z:C+m?,1,2,1,0,1,1,0,1,0,0,0,1,2,0,9,0,0,0,0,0,10,2,1,7,4,8,3,0,3,4,4,4,9,6,0,0;
ML:B:C,234,246,49,84,5,0,11,228,254,0,0,2,8,1,3,1,1,0,146,0,10,0,1,115,19,0,167,19,0,121,21,9,188,112,93,6
The sequence has a length of 430 bases. According to the MM:Z tag, it gives information about 5-Methylcytosine presence. The modification status of the first cytosine is unknow, the second is called, the third and the fourth are not called etc... So MM:Z tag gives information for SUM(numeric value of MM:Z) + nb of numeric value = 120 cytosines. ML:B:C tag gives the same number of information about cytosines. However, my read contains only 110 cytosines. Why is it possible to get this difference ?
Best regards.
Antoine
Hi, did you find an answer for this question? I have the exact same problem and just don't understand why there are more Cs than indicated.
Can you share your bam file and point to me which records have such issue?