Hi guys,
I want to analyze the microarray data (GSE13159). The "series matrix file" of it is about 900 MB and the "RAW" file of it is about 9.3 GB. I know that the series matrix files are processed and normalized and the raw file are not. However, I wonder which format is more suitable for analysis and further publication? Are the journals prefer papers that analyzed raw files?
I would say it depends on how you are analysing the data.
For example, if I set a pipeline where I start data analysis from scratch I would start with raw data. However, if I want reproducibility I will always go for the processed data.
There is no harm in using processed data (series matrices).
The RAW data in this case are the CEL files which are the microarray raw data. This would generally be the starting point of a microarray analysis, at least for me. Alternatively, use the GEOquery packge to obtain normalized data from microarray studies that are hosted at GEO. You can basically start an analysis from any proper source you want. A publication requires a novelty, so some kind of finding that others have not made before. Given that this is the MILES study which has been analysed a million times before you really have to put in effort I guess.