Hi, I have generated a file by cytosine report function in bismark to calculate even the non CpG methylated Cs(CHH,CHG). The file of CHH shown as follow:
> chr1 3000001 - 0 0 CHH CNN
chr1 3000006 + 0 0 CHH CTT
chr1 3000011 + 0 0 CHH CTA
chr1 3000015 - 0 0 CHH CAT
chr1 3000021 - 0 0 CHH CTA
chr1 3000030 - 0 0 CHH CAT
chr1 3000038 - 0 0 CHH CCA
chr1 3000039 - 0 0 CHH CCC
chr1 3000041 - 0 0 CHH CAC
chr1 3000054 + 0 0 CHH CTT
chr1 3000059 - 0 0 CHH CAA
chr1 3000061 + 0 0 CHH CCT
chr1 3000062 + 0 0 CHH CTT
chr1 3000065 + 0 0 CHH CTT
chr1 3000073 + 0 0 CHH CCT
chr1 3000074 + 0 0 CHH CTA
chr1 3000082 + 0 0 CHH CTT
chr1 3000086 - 0 0 CHH CTA
chr1 3000087 - 0 0 CHH CCT
chr1 3000091 - 0 0 CHH CAA
chr1 3000092 - 0 0 CHH CCA
I have to calculate the total coverage at each location and % methylation and for this the formula i know is
`column4 of '+' strand + column5 of '+' strand + column4 of '-' strand + column5 of '-' strand]= total coverage`
and percentage was equal to [($4/$4+$5)*100 of '+'strand +($4/$4+$5)*100 of -strand]/2
but this could only b possible in CpG cytosine covergae and CHG coverage files that have output format like:`
chr1 3000035 + 0 0 CHG CTG
chr1 3000037 - 0 0 CHG CAG
chr1 3000045 + 0 0 CHG CAG
chr1 3000047 - 0 0 CHG CTG`
means alternate + and negative strand. But in CHH file there are no alternate strands and even the gaps between the values is not consistent. I dont know how to calculate these values. Should I do it for each single line means total coverage will be then sum of column 4 and 5 and not to bother about + strand or its relative negative strand.Any suggestions