Hi everybody,
I am quite new in the field of bioinformatics and epigenetics and right now I am stuck with my analysis. I want to analysis if we see any correlation between changes of DNA methylation at certain loci (promoter, exons, introns ... ) and the expression of the corresponding gene (unfortunately I dont have the raw file but only log2fold changes) between two cell types.
The first question is should I work with peaks ( I have already peaksets of both MeDip-seq samples which I generated with SICER) or the whole read-count of Input and IP sample for a given gene region (for example -5kb from the TSS)?
If I go for the whole readcount I have to calculate somehow a methylation level. I tried to calculate such value using this formula:
(unique_reads / total_readcount)*region_length
My idea was to use this formula in both IP and Input sample and subtract the input value from the IP value. Unfortunately I got a lot of negative values and therefor its not really possible to calculate a log2 value to correlate with gene expression data. Is there another way to subtract the background noise from my IP sample using the Input control?
Any other ideas how I could correlate this two datasets with each other?
Thanks for any kind of suggestion.
Flo