Annotation Affymetrix probesets to Gene symbols
1
0
Entering edit mode
5.2 years ago
mannoulag1 ▴ 130

Hi, I am using an 'Affymetrix Human Genome U95 Version 2 Array', I would like to replace the affymetrix probes with the gene symbols in my data matrix 'human'. which one of these code can I use and how correct it please?

 require("biomaRt")
 mart <- useMart("ENSEMBL_MART_ENSEMBL")
     mart <- useDataset("hsapiens_gene_ensembl", mart)
     annotLookup <- getBM(mart=mart,  attributes=c("affy_hg_u133_plus_2","ensembl_gene_id",
     "gene_biotype","external_gene_name"),  filter = "affy_hg_u133_plus_2",   values = rownames(human), uniqueRows=TRUE)
    indicesLookup <- match(rownames(human), annotLookup$affy_hg_u133_plus_2)
    rownames(human) <- paste(annotLookup[indicesLookup, "external_gene_name"], c(1:length(indicesLookup)), sep="_")
    rownames(human) <-gsub("_[0-9]*$", "", rownames(human))
    head(rownames(human))
        [1] "NA" "NA" "NA" "NA" "NA" "NA"

in the second one, I obtained a lower size data matrix 'names' with the affy probes id and the symboles? how to correct the code and how to remove the probes id from this matrix? thank you

library(hgu95av2.db) 
    x <- hgu95av2SYMBOL
    # Get the probe identifiers that are mapped to a gene symbol
    mapped_probes <- mappedkeys(x)
    xx <- as.list(x[mapped_probes])
    names <-merge(x,human,by.x=1,by.y=0)
RNA-Seq R affymetrix gene • 6.7k views
ADD COMMENT
0
Entering edit mode

I also have this data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE133824 and I would like to convert all Affymetrix HTA 2.0 IDs to standard gene symbols for each sample. The methods above do not work with this data. Can someone help with this please

ADD REPLY
0
Entering edit mode

Hi javanokendo, I have posted an answer here as a Tutorial (for you and future users): Affymetrix HTA 2.0 id conversion

ADD REPLY
3
Entering edit mode
5.2 years ago

Hello,

From where did you obtain the code for the biomaRt workflow? - one of my posts? If you use that, then you need to select a different attribute, because, in that code that you have pasted, you are assuming that the probe IDs are derived from the 'Affymetrix U133 Plus 2.0'; however, you have the 'Affymetrix U95 Version 2'. You can find attributes via biomaRt like this:

listAttributes(mart)[grep("affy", listAttributes(mart)[,1]),]
                     name                 description         page
104          affy_hc_g110          AFFY HC G110 probe feature_page
105         affy_hg_focus         AFFY HG Focus probe feature_page
106         affy_hg_u133a         AFFY HG U133A probe feature_page
107       affy_hg_u133a_2       AFFY HG U133A 2 probe feature_page
108         affy_hg_u133b         AFFY HG U133B probe feature_page
109   affy_hg_u133_plus_2   AFFY HG U133 Plus 2 probe feature_page
110          affy_hg_u95a          AFFY HG U95A probe feature_page
111        affy_hg_u95av2        AFFY HG U95Av2 probe feature_page
112          affy_hg_u95b          AFFY HG U95B probe feature_page
113          affy_hg_u95c          AFFY HG U95C probe feature_page
114          affy_hg_u95d          AFFY HG U95D probe feature_page
115          affy_hg_u95e          AFFY HG U95E probe feature_page
116          affy_hta_2_0          AFFY HTA 2 0 probe feature_page
117   affy_huex_1_0_st_v2   AFFY HuEx 1 0 st v2 probe feature_page
118         affy_hugenefl         AFFY HuGeneFL probe feature_page
119 affy_hugene_1_0_st_v1 AFFY HuGene 1 0 st v1 probe feature_page
120 affy_hugene_2_0_st_v1 AFFY HuGene 2 0 st v1 probe feature_page
121        affy_primeview        AFFY PrimeView probe feature_page
122         affy_u133_x3p         AFFY U133 X3P probe feature_page

...so, you will want to use affy_hg_u95av2.

As to whether or not the general code will work, please take time to examine what each line is doing, make QC checks, and then make an 'executive' decision about whether or not it is doing what you want. To me, it looks like the code will do what you want it to do, but please do check yourself.

--------------------

Regarding the other code that you have pasted using hgu95av2.db, I am not so sure... You simply have to do something like the following:

library(hgu95av2.db) 

mapIds(
  hgu95av2.db,
  c('991_g_at','976_s_at','954_s_at', 'hhh', '39167_r_at'),
  keytype = 'PROBEID',
  column = 'SYMBOL')

'select()' returned 1:1 mapping between keys and columns
  991_g_at   976_s_at   954_s_at        hhh 39167_r_at 
    "FLT1"    "MAPK1"   "PPP1CA"         NA "SERPINH1"

I added a dummy probe ID, 'hhh', just to show you that it returns NA when no match is found.

-------------------------------

Kevin

ADD COMMENT
0
Entering edit mode

Hi kevin, thank you very much, so I did this:

 library("biomaRt")
    library(hgu95av2.db) 
    mart <- useMart("ENSEMBL_MART_ENSEMBL")
    mart <- useDataset("hsapiens_gene_ensembl", mart)
    annotLookup <- getBM(mart=mart,  attributes=c("affy_hg_u95av2","hgnc_symbol"),
      filter = "affy_hg_u95av2",   values = rownames(human), uniqueRows=TRUE)
      indicesLookup <- match(rownames(human), annotLookup$affy_hg_u95av2)
     rownames(human) <- paste(annotLookup[indicesLookup, "hgnc_symbol"], c(1:length(indicesLookup)), sep="_")
    rownames(human) <-gsub("_[0-9]*$", "", rownames(human))
    head(rownames(human))
[1] "RABGGTA" "GDPD3"   "TIE1"    ""        "CXCR5"   "CXCR5"
ADD REPLY
1
Entering edit mode

Looks okay to me. You should always check the output of each command so that you are sure that you know what it is doing. Here is a fully reproducible example:

[1] Download study data from GEO that used AFFY HG U95Av2

library(Biobase)
library(GEOquery)
human <- getGEO("GSE713", GSEMatrix =TRUE, getGPL=FALSE)
if (length(human) > 1) idx <- grep("GPL8300", attr(human, "names")) else idx <- 1
human <- human[[idx]]

human is an ExpressionSet object with Affymetrix probe IDs as rownames

human
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12623 features, 20 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: 18.3 24.2 ... 23.2 (20 total)
  varLabels: title geo_accession ... data_row_count (30 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 12907719 
Annotation: GPL8300

rownames(human)[1:8]
 [1] "1002_f_at" "1003_s_at" "1004_at"   "1005_at"   "1006_at"   "1007_s_at"
 [7] "1008_f_at" "1009_at"

[2] Look up the probe IDs with biomaRt

library("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annotLookup <- getBM(
  mart=mart,
  attributes=c("affy_hg_u95av2","hgnc_symbol"),
  filter = "affy_hg_u95av2",
  values = rownames(human),
  uniqueRows=TRUE)

biomaRt never returns data in the same order as it's input:

head(annotLookup)
  affy_hg_u95av2 hgnc_symbol
1        1787_at      CDKN1C
2         157_at       PRAME
3      1227_g_at      ADAM17
4        1226_at      ADAM17
5       181_g_at      MBOAT7
6         180_at      MBOAT7

So, we have to align annotLookup to our probe ID rownames.

[3] Align annotLookup to our probe ID rownames

indicesLookup <- match(rownames(human), annotLookup$affy_hg_u95av2)

Does order match?

head(data.frame(rownames(human), annotLookup[indicesLookup,]))
    rownames.human. affy_hg_u95av2 hgnc_symbol
178       1002_f_at      1002_f_at            
247       1003_s_at      1003_s_at       CXCR5
248         1004_at        1004_at       CXCR5
184         1005_at        1005_at       DUSP1
659         1006_at        1006_at       MMP10

table(rownames(human) == annotLookup[indicesLookup,'affy_hg_u95av2'])
 TRUE 
11611

Great. However, indicesLookup will contain a NA value in situations where a probe ID was not even found via biomaRt.

You'll also notice that some probe IDs have no HGNC symbols, as indicated by empty values in the hgnc_symbol column. For a data matrix object like that used within ExpressionSets, empty rownames / non-unique rownames are tolerated, but they are not tolerated in data-frames.

For these empty values, we can use the original probe ID, if we want, and create a new column in annotLookup:

annotLookup$hgnc_X_affy <- ifelse(
  annotLookup$hgnc_symbol == "",
  annotLookup$affy_hg_u95av2,
  annotLookup$hgnc_symbol)

head(annotLookup)
  affy_hg_u95av2 hgnc_symbol hgnc_X_affy
1        1787_at      CDKN1C      CDKN1C
2         157_at       PRAME       PRAME
3      1227_g_at      ADAM17      ADAM17
4        1226_at      ADAM17      ADAM17
5       181_g_at      MBOAT7      MBOAT7
6         180_at      MBOAT7      MBOAT7

tail(annotLookup)
      affy_hg_u95av2 hgnc_symbol hgnc_X_affy
13530       736_f_at                736_f_at
13531         896_at                  896_at
13532         733_at                  733_at
13533         854_at                  854_at
13534     41782_g_at              41782_g_at
13535       944_s_at                944_s_at

Finally, to ensure complete compliance with data-frames, we can ensure that each gene has a unique identifier (optional):

[4] Ensure that each gene has a unique identifier to avoid duplicate issues

newnames <- paste(
  annotLookup[indicesLookup, "hgnc_X_affy"],
  c(1:length(indicesLookup)),
  sep="_")

rownames(human) <- newnames

exprs(human)[1:8,1:5]
             18.3  24.2   36.9  24.7   47.5
1002_f_at_1   6.8   4.1    2.8   5.5    5.8
CXCR5_2      34.3  27.8   25.0  11.0   31.8
CXCR5_3       6.3   7.4   37.9  11.7   36.0
DUSP1_4     636.4 565.0  418.1 454.4  180.1
MMP10_5       8.5   5.0    2.1  15.6    1.9
DDR1_6      142.5 181.4  231.6 222.1  185.7
EIF2AK2_7   809.6 892.2 1243.7 774.7 1041.8
HINT1_8     807.9 674.6  990.0 670.3  822.0

tail(exprs(human))[,1:5]
         18.3 24.2 36.9 24.7 47.5
NA_12618  4.3  2.7  3.0 14.3 23.3
NA_12619  1.6  2.2  1.8  1.7  5.7
NA_12620  4.9  1.3 17.8  1.3 15.5
NA_12621  1.3  1.4 11.1  1.9  3.8
ADD REPLY
0
Entering edit mode

yes, excellent !! thank you very very much Kevin

ADD REPLY

Login before adding your answer.

Traffic: 1738 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6