Question

Annotation For Affymetrix Probe Id "241838_At"

4

Entering edit mode

14.9 years ago

Khader Shameer 18k

241838_at is a significant hit in a gene expression analysis that I am currently working on. Affymetrix annotation provides Gene Symbol for this probe as

"chr6:167330486-167330903 (-)" with additional notes "This probe set was annotated using the Accession mapped clusters based pipeline to a UniGene identifier using 5 transcripts.".

There is no further annotation available for this probe in ADAPT, GATExplorer or AILUN. As this particular probe is a significant hit, I would like to know how can I report this. I would like to know the community is dealing with results based on such ambigous probes ? What could be the reason for Affymetrix to keep such a non-specific (GATExplorer says no genes are mapped to this probe) probe in the chip ?

gene microarray annotation probeset • 6.6k views

ADD COMMENT • link updated 14.9 years ago by Laurent Gautier ▴ 810 • written 14.9 years ago by Khader Shameer 18k

0

Entering edit mode

All answers are nice and helped me to get a new insight in to the problem. I will be selecting best answer as the one with maximum votes by next week.

ADD REPLY • link 14.9 years ago by Khader Shameer 18k

score 6 · Answer 1 · 2010-09-01

Why not look at the [?]probe alignment itself[?] on Ensembl? In this case the probe is intronic to the processed but noncoding transcript RP1-167A14.2. There are ESTs overlapping the probeset which are likely the source sequence used as evidence for inclusion of the probeset.

Affy tends to put every possible exon on the probesets and let the users puzzle out which ones are real rather than stick to a minimal canonical set of genes which may be proven wrong in the future.

You may also want to check the individual probe values for this probeset and reconcile them with any spurious mismatch alignments with other RNA species that could be causing off-target signal before proceeding further.

score 5 · Answer 2 · 2010-09-03

5

Entering edit mode

14.9 years ago

Laurent Gautier ▴ 810

As Daniel points it out there has been a drift between the "transcriptome as we thought we knew it" when arrays were designed and "the transcriptome as we know it today" (shameless plug to an early reference where this was called a "Dorian Gray effect").

If you are using bioconductor to perform the analysis, do consider using probe remapping to perform the same analysis (the MBNI provides regular updates of mappings built against RefSeq and other databases - latest is from July 2010).

ADD COMMENT • link 14.9 years ago by Laurent Gautier ▴ 810

0

Entering edit mode

Igautier, Thanks for this. This is very useful.

ADD REPLY • link 14.9 years ago by Khader Shameer 18k

0

Entering edit mode

I have also found the "customCDFs" (linked above) to be extremely useful. In a recent study I used both the current standard Affymetrix annotations and custom annotations to identify ~100 probesets useful for a specific classification problem. Manual validation of these probesets by alignment to reference genome found that ~10% of the standard probesets no longer work given our current understanding of the transcriptome (the problem is usually ambiguous assignment of probes to multiple loci). CustomCDF annotations had an almost perfect validation rate (unambiguous alignment to expected locus).

ADD REPLY • link 13.5 years ago by Obi Griffith 20k

0

Entering edit mode

One caveat - occasionally the customCDF probesets do not perform as expected. For example, U133A probesets for ESR1. From the standard CDF, only a single probeset out of nine (205225_at) works well for distinguishing ESR1+ from ESR1- patient samples (PMID:17329190). The single customCDF probe set for ESR1 doesn't work either, although alignment to genome doesn't reveal obvious problems. So, in this case, using customCDF will have poor results for an important gene. This experience has led me to use both custom/standard probeset annotations and sort out best probesets downstream.

ADD REPLY • link 13.5 years ago by Obi Griffith 20k

Ram · Answer 3 · 2010-09-02

4

Entering edit mode

14.9 years ago

Tim_Yates ▴ 110

I've got a mapping for the plus2 probes to Ensembl v58 (not the latest v59 though), and the stats I have on that probeset are:

11 probes
10 probes hit the human genome (1 misses):
  chr6:167410826-167410850 (-)
  chr6:167410818-167410842 (-)
  chr6:167410788-167410812 (-)
  chr6:167410774-167410798 (-)
  chr6:167410720-167410744 (-)
  chr6:167410653-167410677 (-)
  chr6:167410637-167410661 (-)
  chr6:167410601-167410625 (-)
  chr6:167410556-167410580 (-)
  chr6:167410543-167410567 (-)

This means that all the probes (that hit) are in the 5' intronic region of ENSG00000227598 (ENST00000444102) and also in the 5' intronic region of ENSESTG00007278250 (ENSESTT00007324270)

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 14.9 years ago by Tim_Yates ▴ 110

1

Entering edit mode

Basically, I have run the HG-U133_Plus_2.probe_tab file (downloaded from Affy) through my X:Map pipeline to get probe->genomic locations mappings. (The same as I used to do for ADAPT, but ADAPT just scanned CDNA sequences). I get the probe tab file, extract the probes, and then run them all through Bowtie (after generating the bowtie index for the Reference Genome of interest).

ADD REPLY • link 14.9 years ago by Tim_Yates ▴ 110

0

Entering edit mode

Thanks for this information, Tim.

ADD REPLY • link 14.9 years ago by Khader Shameer 18k

0

Entering edit mode

Thanks for this information, Tim. Can you tell me how / using what tool you did this search and obtained the mapping results ?

ADD REPLY • link 14.9 years ago by Khader Shameer 18k

score 2 · Answer 4 · 2010-09-01

I think the point is when the U133plus2 chips were designed (I think this probe is from that chip from a quick look at NetAffx) there were a number of cDNA transcripts - indeed in this case a cluster thereof, potentially of unknown function that were used to design the probesets against. Over the course of time, this hasn't become a 'gene' or indeed any particular feature that we would find mapped onto a genome build.

So this boils down to a few things really, either you check your probes against a new build of the genome to make sure each one maps to something we recognise as 'real' or you use a remapped cdf file for your analysis (discussed in answers passim).

You could check the original IMAGE clones (etc. listed on NetAffx) to see whether they have been quietly sidelined, or indeed map to where you think the probeset should on a genome build.

Personally I report Affy accessions rather than gene names when reporting data. It's up to somebody else (perhaps) to disambiguate the situation. Sometimes these arrays throw up things you would spend more time chasing down than is useful or practical.