"old" microarray probes IDs correspondance with gene names
1
0
Entering edit mode
5.8 years ago
guillaume.rbt ★ 1.0k

Hi all,

I'm working on a 2006 public microarray dataset ( https://www.ebi.ac.uk/arrayexpress/experiments/E-TABM-104/ ).

I've reanalysed the data to get differentially expressed transcripts, and now I'm trying to test pathway enrichment for some gene sets.

The problem I'm facing is that I need to identify which probes of the chip correspond to the genes in the sets I want to test. Considering the chip probes are annotated with old EMBL transcript ID (most of the id are like AAXXXXXX, AIXXXXXX, HXXXXX, NXXXXX, RXXXXX, TXXXXX, with numbers for Xs, for example I know that "AI375736" corresponds to CD28 gene).

I'm not really sure how to find a correspondance between the genes I want to study and these transcripts IDs.

If anyone has any advice on how to do that it would be very helpful.

Many thanks

microarray trancripts annotation embl • 1.2k views
ADD COMMENT
0
Entering edit mode

The array is quite old indeed. There are mappings to what appear to be gene descriptions, here:

Check the Excel files.

The arrays are Agilent but do not appear to be supported in biomaRt. However, I note that these IDs that you list are likely GenBank accession IDs and not probe names.

ADD REPLY
0
Entering edit mode

Thank you for your response. It's in those files that I found the IDs, the exact name of the column is "Reporter Database Entry[embl]", it's indeed not the probe name.

ADD REPLY
1
Entering edit mode

You may try to map them with this code, in that case:

ids <- c("AI375736", "AI092544")

library(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
getBM(
  mart=mart,
  attributes=c("protein_id", "embl", "ensembl_gene_id", "gene_biotype", "external_gene_name"),
  filter="embl",
  values = ids,
  uniqueRows=TRUE)

I tried but failed. Some may map, though. Otherwise you may consider eUtils to map these to gene symbols.

ADD REPLY
0
Entering edit mode

thank you very much for trying, I will check other Ids to see if it could work

ADD REPLY
3
Entering edit mode
5.8 years ago

using mysql ucsc (for your example, it's an EST )

$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -P 3306 -D hg38 -e 'select distinct E.qName,E.tName,E.tStart,E.tEnd,K.name,K.name2,K.txStart,K.txEnd from all_est as E,wgEncodeGencodeBasicV28 as K where E.qName="AI375736" and K.chrom=E.tName and NOT( K.txEnd < E.tStart || E.tEnd < K.txStart) ;'
+----------+-------+-----------+-----------+-------------------+-------+-----------+-----------+
| qName    | tName | tStart    | tEnd      | name              | name2 | txStart   | txEnd     |
+----------+-------+-----------+-----------+-------------------+-------+-----------+-----------+
| AI375736 | chr2  | 203735217 | 203735676 | ENST00000374481.7 | CD28  | 203706474 | 203738910 |
| AI375736 | chr2  | 203735217 | 203735676 | ENST00000324106.8 | CD28  | 203706547 | 203738912 |
+----------+-------+-----------+-----------+-------------------+-------+-----------+-----------+
ADD COMMENT
0
Entering edit mode

great, thank for the tip!

ADD REPLY

Login before adding your answer.

Traffic: 1919 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6