I have some data derived from a microarray. I don't know what chip it's from or anything like that. For each condition, I just have a list of transcript names and three numbers next to them representing three replicates. The numbers are supposed to be on a log scale. I assume it's base 2, but I have no idea.
I want to translate the information into information about genes. How do I take the numbers from multiple transcripts and combine them into a single score for a gene?
Update: The platform is Agilent. It was a custom chip made for the C. elegans genome. I have good information the about the conditions it was collected under such as the strain and growth stage.
I noticed that the columns say 'qnorm'.
Update 2: From googling, I'm guessing qnorm means quantile normalization. Can somebody confirm this? (I realize this might be completely vague information and ultimately not confirmable.) However, does this seem like a reasonable assumption?
Are you even sure that it is microarray data (not for example RNAseq data)? What kind of transcript identifiers do you have? How are the samples described? What does the distribution of values look like? With this kind of information we maybe be able to make an educated guess at the platform. Until we feel confident about the platform it may be inappropriate to summarize values to the gene level. For, example, if they are actually cufflinks derived RNAseq data it might be more appropriate to sum transcript values for all isoforms of a gene. A safer bet maybe to proceed with transcript-level analysis of the values you have. But, then map to genes for downstream interpretation. What kind of analysis do you hope to perform with the gene-level values?
@Obi: Thanks for the feedback. I will modify my question.