Question

"old" microarray probes IDs correspondance with gene names

0

Entering edit mode

5.8 years ago

guillaume.rbt ★ 1.0k

Hi all,

I'm working on a 2006 public microarray dataset ( https://www.ebi.ac.uk/arrayexpress/experiments/E-TABM-104/ ).

I've reanalysed the data to get differentially expressed transcripts, and now I'm trying to test pathway enrichment for some gene sets.

The problem I'm facing is that I need to identify which probes of the chip correspond to the genes in the sets I want to test. Considering the chip probes are annotated with old EMBL transcript ID (most of the id are like AAXXXXXX, AIXXXXXX, HXXXXX, NXXXXX, RXXXXX, TXXXXX, with numbers for Xs, for example I know that "AI375736" corresponds to CD28 gene).

I'm not really sure how to find a correspondance between the genes I want to study and these transcripts IDs.

If anyone has any advice on how to do that it would be very helpful.

Many thanks

microarray trancripts annotation embl • 1.2k views

ADD COMMENT • link updated 5.8 years ago by Pierre Lindenbaum 164k • written 5.8 years ago by guillaume.rbt ★ 1.0k

0

Entering edit mode

The array is quite old indeed. There are mappings to what appear to be gene descriptions, here:

Check the Excel files.

The arrays are Agilent but do not appear to be supported in biomaRt. However, I note that these IDs that you list are likely GenBank accession IDs and not probe names.

ADD REPLY • link 5.8 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you for your response. It's in those files that I found the IDs, the exact name of the column is "Reporter Database Entry[embl]", it's indeed not the probe name.

ADD REPLY • link 5.8 years ago by guillaume.rbt ★ 1.0k

1

Entering edit mode

You may try to map them with this code, in that case:

ids <- c("AI375736", "AI092544")

library(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
getBM(
  mart=mart,
  attributes=c("protein_id", "embl", "ensembl_gene_id", "gene_biotype", "external_gene_name"),
  filter="embl",
  values = ids,
  uniqueRows=TRUE)

I tried but failed. Some may map, though. Otherwise you may consider eUtils to map these to gene symbols.

ADD REPLY • link 5.8 years ago by Kevin Blighe 88k

0

Entering edit mode

thank you very much for trying, I will check other Ids to see if it could work

ADD REPLY • link 5.8 years ago by guillaume.rbt ★ 1.0k

score 3 · Accepted Answer · 2019-02-06

using mysql ucsc (for your example, it's an EST )

$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -P 3306 -D hg38 -e 'select distinct E.qName,E.tName,E.tStart,E.tEnd,K.name,K.name2,K.txStart,K.txEnd from all_est as E,wgEncodeGencodeBasicV28 as K where E.qName="AI375736" and K.chrom=E.tName and NOT( K.txEnd < E.tStart || E.tEnd < K.txStart) ;'
+----------+-------+-----------+-----------+-------------------+-------+-----------+-----------+
| qName    | tName | tStart    | tEnd      | name              | name2 | txStart   | txEnd     |
+----------+-------+-----------+-----------+-------------------+-------+-----------+-----------+
| AI375736 | chr2  | 203735217 | 203735676 | ENST00000374481.7 | CD28  | 203706474 | 203738910 |
| AI375736 | chr2  | 203735217 | 203735676 | ENST00000324106.8 | CD28  | 203706547 | 203738912 |
+----------+-------+-----------+-----------+-------------------+-------+-----------+-----------+