Hi
I am trying to use a public dataset which does not have coverage information
chr1 10468 10470 0.895333
chr1 10470 10472 0.895967
chr1 10483 10485 0.99393
What is the best/ideal way to calculate regioinal level hai to information?
There are certain tools like roimethstat that I use in methpipe but they require coverage information.
So I was wondering what would be the ideal way to get a normalized value of a region (say exon). Should I just add all the values and divide by total number of CpG positions or the total length of the region.
Where does this data come from? It looks to me like perhaps it should be organized as follows:
If this is the case, it looks like you already have percentage methylation values (last column - all regions listed would be unmethylated) for segments.
There might be an arbitrary ID in the first column, or possibly a sample ID (the 2nd and 3rd, as well as the 4th and 5th, segments appear to be strongly overlapping). This would make a difference in the interpretation.
If you were working with raw data, the best tool would depend upon the technology being used.
If you are working with processed data, I think you would want to look for a tool that operates on genomic intervals (for example, look for overlap between your regions and a set of promoter locations)
I apologize I had wrongly written the processed information that I calculated which is basically the average over a region. I have updated the question with correct info.
The regions in the updated example look small - each is 3 bp. If they are all like this, you could take the middle nucleotide (such as 10469 for the first row) and force it into format for tools that analyze percentage methylation. For example, I think you can make it look like the .bed file needed for methylKit. COHCAP also accepts just percentage methylation values (in fact, that is the only thing it directly works with), but that is meant for targeted BS-Seq (so, you would also need to provide a pre-defined list of regions).