Hi,
I am studying on microarray data set which its platform is the GPL 96([HG-U133A] ). Consequently, I used the GPL 96 annotation file to annotate gene expression data and convert probe ids into gene identifiers. My problem is that in the annotation file I see three columns as gene identifiers include "Gene title", "Gene symbol" and "Gene ID". Because some times multiple probes map to one gene, then I used aggregate()
function as follows:
my_aggregate_Expr_data <- aggregate(my_Expr_data[, -c(1,2)],
by = list(gene_name = my_data$`Gene symbol`),
FUN = mean,
na.rm = TRUE)
which in my_Expr_data[, -c(1,2)]
, the first column is "Prob ID" and the second column is "Gene symbol".
However, the "Gene symbol" is not a good identifier, and I need an identifier such as "Ensemble id", which indicates the unique position for each gene, and I do not have the "Ensemble id" column for each probe in the GPL96 annotation file. With this account, can I use "Gene ID" as a unique identifier for each gene associated with one or more probes? I appreciate if anybody shares his/her idea with me.
Best Regards,
Dear Dr. Blighe
Thanks for your comment. I also, add
'ENTREZID'
in your code. but as you see for1007_s_at
I have 8 'Ensemble Id'. Their Gene Symbol is DDR1 and MIR4640. So, exactly my question is which 'Ensemble Id' should I use for1007_s_at
at DDR1? or by which mechanism I can understand which 'Ensemble Id' is my target for downstream analysis in1007_s_at
probe ?I appreciate it if you share your comment with me.
Best Regards,
That's right, Kevin. I have the same problem, could you please let me know how I can solve this issue?