How can I annotate microarray data set through Ensemble Id?
1
0
Entering edit mode
4.4 years ago
modarzi ▴ 170

Hi,

I am studying on microarray data set which its platform is the GPL 96([HG-U133A] ). Consequently, I used the GPL 96 annotation file to annotate gene expression data and convert probe ids into gene identifiers. My problem is that in the annotation file I see three columns as gene identifiers include "Gene title", "Gene symbol" and "Gene ID". Because some times multiple probes map to one gene, then I used aggregate() function as follows:

my_aggregate_Expr_data <- aggregate(my_Expr_data[, -c(1,2)],
                          by = list(gene_name = my_data$`Gene symbol`),
                            FUN = mean,
                            na.rm = TRUE)

which in my_Expr_data[, -c(1,2)], the first column is "Prob ID" and the second column is "Gene symbol".

However, the "Gene symbol" is not a good identifier, and I need an identifier such as "Ensemble id", which indicates the unique position for each gene, and I do not have the "Ensemble id" column for each probe in the GPL96 annotation file. With this account, can I use "Gene ID" as a unique identifier for each gene associated with one or more probes? I appreciate if anybody shares his/her idea with me.

Best Regards,

Affymetrix Ensemble Id GPL 96 • 1.7k views
ADD COMMENT
1
Entering edit mode
4.4 years ago

Hi, I would use hgu133a.db, assuming that you can get the original probe IDs (stored in probes):

require(hgu133a.db)

probes <- rownames(gset)

annotLookup <- select(hgu133a.db, keys = probes,
  columns = c('PROBEID', 'ENSEMBL', 'SYMBOL'))

You can modify the above code to work for Gene Symbol - to - Ensembl conversion, too.

Kevin

ADD COMMENT
0
Entering edit mode

Dear Dr. Blighe

Thanks for your comment. I also, add 'ENTREZID' in your code. but as you see for 1007_s_at I have 8 'Ensemble Id'. Their Gene Symbol is DDR1 and MIR4640. So, exactly my question is which 'Ensemble Id' should I use for 1007_s_at at DDR1? or by which mechanism I can understand which 'Ensemble Id' is my target for downstream analysis in 1007_s_at probe ?

I appreciate it if you share your comment with me.

Best Regards,

     PROBEID        ENSEMBL    SYMBOL  ENTREZID
1   1007_s_at   ENSG00000204580 DDR1    780
2   1007_s_at   ENSG00000223680 DDR1    780
3   1007_s_at   ENSG00000229767 DDR1    780
4   1007_s_at   ENSG00000230456 DDR1    780
5   1007_s_at   ENSG00000234078 DDR1    780
6   1007_s_at   ENSG00000137332 DDR1    780
7   1007_s_at   ENSG00000215522 DDR1    780
8   1007_s_at   ENSG00000284370 MIR4640 100616237
9   1053_at     ENSG00000049541 RFC2    5982
10  117_at      ENSG00000173110 HSPA6   3310
11  121_at      ENSG00000125618 PAX8    7849
12  1255_g_at   ENSG00000048545 GUCA1A  2978
13  1255_g_at   ENSG00000287363 GUCA1A  2978
14  1294_at     ENSG00000182179 UBA7    7318
ADD REPLY
0
Entering edit mode

That's right, Kevin. I have the same problem, could you please let me know how I can solve this issue?

ADD REPLY

Login before adding your answer.

Traffic: 2066 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6