I'm struggling to understand how to tell what a GSM of methylation data has had done to it. How can I determine, in a concrete manner, whether the methylation values have been normalized in some manner, and if they have been, how can I tell exactly what has been done? Is it the case that:
- GSM files should always be raw data.
- You have whatever information is shared on the GEO accession page and nothing else, leaving you to guess what 'normalized beta values' actually means.
- There is a programmatic way to tell exactly what has been done from the GSM file itself.
I'm a bit at sea, because I want to compare several datasets to reproduce another scientists experimental findings (for verification), but it seems to me that the information I would obtain from exclusively downloading the GSE/GSM files is ambiguous and consequently confounding to cross study analysis. For example, the pData of the sample I have open in front of me has a column labeled "data_processing" which simply contains the word 'minfi', and this is the extent of the information I can see indicating what kind of normalization the samples have undergone.