I want to write a simple program to calculate the fold change from Affymetrix soft files. Should normalisation be carried out for soft files (if so how ?) or are the soft files already normalised ?
I want to write a simple program to calculate the fold change from Affymetrix soft files. Should normalisation be carried out for soft files (if so how ?) or are the soft files already normalised ?
Edited after comment by Neil who pointed out that the SOFT file can in fact contain data.
The format of the the SOFT file is described here: http://www.ncbi.nlm.nih.gov/geo/info/soft2.html). For Affymetrix it will normally not contain expression result data. These are contained in external raw data files, e.g., Affymetrix .CEL files which are referenced within the SOFT file using the !Sample_supplementary_file attribute.
The Affymetrix files will usually be bundled in an archive that contains the real data. For Affymetrix that will normally be .CEL files, which you would need to normalize. (You could use our arrayanalysis.org for that but of course there are many other options).
Is that strictly true? SOFT files can contain data tables after a "!Sample_table_begin" line, like this example file: http://www.ncbi.nlm.nih.gov/geo/info/soft_ex_affy.txt. The data table may or may not be normalized. But it's true that you need raw CEL images for true normalization.
You need to look at the metadata in the SOFT file. In the link to the example file, for example, you'll see "#VALUE = MAS5-calculated Signal intensity", which tells you that the arrays were processed using the MAS5 algorithm. Usually if unprocessed, values will be described as "raw".
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Where are the SOFT files from? If they're from a public repository, there should be metadata describing how they were processed. Can you give us an accession number?