Hello friends, My goal is to normalize a fold change matrix, to cluster samples together.
My data comprises MeRIP-seq, similar to chip-seq but for detecting methylation. i have analyzed the data, and used MACS for peak calling. So, now I have a [samples * peak positions] matrix, where the matrix values are the fold change from MACS callpeak command.
I want to try different dimensionality reductions and clustering methods, in order to cluster the samples together, but I need to normalize the matrix first.
I did not find a method that I feel comfortable with yet.
I have tried Deseq2, which I know is not meant for this type of data, and indeed the results do not look amazing.
I tried fitting my data to different distributions with no success.
Do you have any suggestions?
The typical type of data matrix you use is one that contains read counts instead of fold changes. I cannot say that I ever saw a ChIP-seq like analysis with fold changes as input. It's also imo not recommended since fold changes without stats are meaningless. FCs can be large due to noise without that the data actually support the large FC. Consider using counts, e.g. obtained with featureCounts.
Thx for the reply.
I will look more into it, but the issue with read count matrix is that the read count is also meaningless without comparing it to the read count of the same input segment. And when you compare them in every method I know, you get a fold change.
One possible solution I thought of is to convert the fold change matrix into 1, 0, and -1, representing: peak in IP, no peak, and peak in input.
Although I would love to find a way that also incorporates the size of the peaks.