Question

Ambiguous Probe Id Re-Annotations

3

Entering edit mode

13.8 years ago

Phil ▴ 30

Hi All

I'm trying to re-annotate probe ids from experiments carried out using a customised gene chip. The probe data is currently labelled with a variety of identifiers, mostly from a standardised probe-set like Agilent or Affymetrix. However, a small portion have an ambiguous description which does not conform to any known probe ID labelling schema and approximately one fifth of the total probe set have been listed using modified UniGene identifiers, and many of these ids have since been retired. The only constant known is that all sequence data originates from a patient data and is human. Gene selection for chip customisation was facilitated by literature review. I am attempting to track down the dataset used to annotate these probes, but in the absence of such information could do with a a few suggestions.

We're currently looking at batch BLAST'ing the sequence fragments against a current nucleotide dataset (preferably RefSeq), but need an api/rest driven service for doing so with complete autonomy. I have identified the WABI resource as one method for doing so [http://xml.nig.ac.jp/rest/Invoke?service=Blast&method=searchParam&program=<PROGRAMME>&database=<DB_ID>&query=<SEQUENCE_STRING>¶m=-b+<NUM_RETURNED_HITS>+-m+<TABLE_FORMAT>] but would like to be able to submit searches against RefSeq to bring annotations in line with other data being produced here. WABI does not provide a direct entry point for RefSeq.

So, has anyone got any experience updating identifiers from Expression data? In the case where a retired dataset identifier is provided is re-annotation the best solution or would following the succession of IDs through their respective bioinformatic dataset be preferential?

All opinions, tips, suggestions, critiques welcomed.

Many thanks.

nucleotide probeset expression refseq • 2.9k views

ADD COMMENT • link updated 13.8 years ago by Chris Evelo 10k • written 13.8 years ago by Phil ▴ 30

score 2 · Answer 1 · 2011-10-21

Yes, we have done things like that, we even did the detective work some times. But I agree with Larry that that in general is not a good idea. We used some sophisticated tricks for reannotation as well, and even published a [?]paper[?] on it once. That used a neat trick that essentially created a kind of Refseq database before it existed. But that is no longer needed. I would just download ENSEMBL transcripts (with say 500 bp on either end) and BLAST (or BLAT) against that. That is what @Martijn did to annotate the [?]NuGO arrays[?].

score 0 · Answer 2 · 2011-10-21

In regard to your last question - reannotation or detective work - I would definitely opt for reannotation because the human genome is likely more complete and better annotated now than when those custom probes were designed. Second, the detective work can be painstakingly slow and frustrating. That said, it may be worth a quick look to see if there was an overall strategy in selecting these custom probes - say microRNAs or all protein kinase exons, exons from kinase-like and kinase pseudogenes. If you have that kind of documentation, it is to your advantage obviously to use it.

I have no experience if WABI can be made to accept a RefSeq library.