Hi everyone,
My lab is using gene expression data generated by Illumina Human HT-12 v3 Expression Beadchips. As advertised by the company, this products has 48000+ probes for 25000 genes. I have never used expression data before and would like to cluster genes based on their expression. The data has already been normalized and corrected for batch effects.
The current file format is:
ProbeID Sample1 Sample2
I would like to get the following format:
GeneID Sample1 Sample2
It seems that some genes have more probes than others. Moreover, there can be multiple transcripts for a given gene. I was wondering if someone could please give me a general idea about getting the desired format.
Thank you for your time.
Thank you very much for your time! I am trying it right now. I was wondering: when someone wants to cluster genes, don't they need one expression value for each gene? If so, how can you incorporate the expression of several probes within a gene into one value?
Thank you!
Not that I am sure of this, but I would not try to summarize different probesets of a gene into a single value, since as you have mentioned, they could be from different transcripts of the same gene. Its better to continue with the normalized expression values of probes for clustering.