Affymetrix Human Genome U133 Plus 2.0 Array
1
0
Entering edit mode
6.2 years ago

i have taken .soft data from GEO database, for my research work i need genomic location of each gene present in the affymetrix gene chip, for that i use biomaRt package, when i am mapping .soft file and the result of annotLookup table some probeid's are missing or they are not in the annotLookup table. how can i get the information of that genes. please help me. dateset is GDS3487.soft taken from GEO database for Example: 1552819_at present in .soft file but this probe id is not in annotLookup

R • 5.3k views
ADD COMMENT
0
Entering edit mode
6.2 years ago

Which probe IDs are not correctly mapping? They are most likely one of:

  • control probes
  • probes that map to some in silico predicted / unconfirmed / unofficial genes

Did you annotate like I show here: A: How do I convert Affymetrix ID names to gene names ?

ADD COMMENT
0
Entering edit mode

following are the probe id's, which is not mapped with any of the genes in the refGene.txt data. Actually 13143 id's are not mapping with any of the genes in ENSEMBLE and NCBI dataset. some of the id's are:

243275_at
243171_at
1560728_at
239715_at
1562920_at
228151_at
1556172_at
240090_at
217022_s_at
242315_at
1562091_at
239924_at
243063_at
238255_at
241021_at
243947_s_at
1560528_at
243794_at
234258_at
234663_at
235875_at
1570423_at
242481_at
241001_at

can anybody help to find the reason for this, but the .soft file gives some name corresponding to these probe id's, anyidea from where they got the names.. please help me, according to ENSEMBLE only 27199 id's mapping to a genename balance probe id's lack there identity.

ADD REPLY
0
Entering edit mode

Can you show the code that you have used?

ADD REPLY
0
Entering edit mode
source("http://www.bioconductor.org/biocLite.R")
biocLite()
ensembl=useMart("ensembl")
ensembl=useDataset("hsapiens_gene_ensembl",mart=ensembl)
map1 = getBM(attributes = c('affy_hg_u133_plus_2','hgnc_symbol','chromosome_name','start_position','end_position','band',"ensembl_gene_id","entrezgene"),filters=c('affy_hg_u133_plus_2',"with_entrezgene","ensembl _gene_id"), values=list(TRUE,dataHT[,1]),mart=ensembl)
ADD REPLY
1
Entering edit mode

Hey, thank you very much.

I was able to annotate many of the probes that you listed using the following sequence of commands:

probes
 [1] "243275_at"   "243171_at"   "1560728_at"  "239715_at"   "1562920_at" 
 [6] "228151_at"   "1556172_at"  "240090_at"   "217022_s_at" "242315_at"  
[11] "1562091_at"  "239924_at"   "243063_at"   "238255_at"   "241021_at"  
[16] "243947_s_at" "1560528_at"  "243794_at"   "234258_at"   "234663_at"  
[21] "235875_at"   "1570423_at"  "242481_at"   "241001_at"

ensembl=useMart("ensembl")

ensembl=useDataset("hsapiens_gene_ensembl",mart=ensembl)

getBM(mart=ensembl,
    attributes=c("affy_hg_u133_plus_2", "ensembl_gene_id", "gene_biotype", "external_gene_name"),
    filter="affy_hg_u133_plus_2",
    values=probes,
    uniqueRows=TRUE)
   affy_hg_u133_plus_2 ensembl_gene_id                       gene_biotype
1            243794_at ENSG00000256906                            lincRNA
2           1570423_at ENSG00000237975                          antisense
3            239715_at ENSG00000244627 transcribed_unprocessed_pseudogene
4           1562920_at ENSG00000271714                          antisense
5            243171_at ENSG00000229196                          antisense
6          217022_s_at ENSG00000211895                          IG_C_gene
7            241001_at ENSG00000249673                          antisense
8          217022_s_at ENSG00000211890                          IG_C_gene
9            242481_at ENSG00000112852                     protein_coding
10           239924_at ENSG00000272578 transcribed_unprocessed_pseudogene
11         217022_s_at ENSG00000282633                          IG_C_gene
12           238255_at ENSG00000248401               processed_pseudogene
13         217022_s_at ENSG00000276173                          IG_C_gene
14           228151_at ENSG00000185305                     protein_coding
   external_gene_name
1           LINC02419
2             FLG-AS1
3              TPTEP2
4          AC010501.1
5          AC087071.1
6               IGHA1
7           NOP14-AS1
8               IGHA2
9              PCDHB2
10         AP000347.1
11              IGHA1
12         AC114781.1
13              IGHA2
14              ARL15

I checked 2 of the examples that fail to be annotated and they are both probes that target genes whose genomic regions appear to have been removed from GRCh38 (but that are present in GRCh37). If you go to the UCSC Genome Browser, you can simply search for these. I have initially searched for:

  • 1560728_at
  • 1556172_at

That may not be the complete story, though.

Note that you can download comprehensive annotation for this array version from the Affymetrix / Thermofisher support site: GeneChip™ Human Genome U133 Plus 2.0 Array

[the file you may want is likely the one called 'Current NetAffx Annotation Files: HG-U133_Plus_2 Annotations, CSV format, Release 36']

ADD REPLY
1
Entering edit mode

thank you kevin i am planning to moving forward with the dataset from Thermofisher support site: 'Current NetAffx Annotation Files: HG-U133_Plus_2 Annotations, CSV format, Release 36' , only 231 probe id is missing the external_ gene_name, but this is so far better than the previous dataset i was used, thank you once again.

ADD REPLY

Login before adding your answer.

Traffic: 1561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6