Question

Common probe sets between hgu133a hgu95a

0

Entering edit mode

6.8 years ago

rv ▴ 20

Hello everyone, I am trying to elaborate cel files from chips hgu133a and hgu95a.

When I search for common genes

library('affy');    
dataA <- ReadAffy ..    
dataB <- ReadAffy ..    
a <- featureNames(dataA);    
b <- featureNames(dataB);    
c <- intersect(a, b);    
length(c);

I get only 312 probe sets.

Could you give me any suggestion or explanation? Is there any reason for such a relevant difference?

affy microarray annotation hgu133a hgu95a • 1.6k views

ADD COMMENT • link updated 6.8 years ago by Kevin Blighe 88k • written 6.8 years ago by rv ▴ 20

score 4 · Accepted Answer · 2018-01-28

4

Entering edit mode

6.8 years ago

Kevin Blighe 88k

I'm guessing that some or all of those 312 that match are control probes.

The issue is just different probe IDs. You're comparing an older chip version to a newer [version]. Affymetrix update the probe IDs with each release of their platform, whilst retaining many older probe IDs that have not changed. In certain situations, for example, they will release a new probe that targets a different exon of the same gene as that on a previous version of the chip, or they will update the sequence/length of the probe itself.

If you proceed with each study independently and then summarise expression over each gene, you will then see overlap once you annotate everything by gene name.

Kevin

ADD COMMENT • link 6.8 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you Kevin for your answer. Yes, I suspected something like this. If I can ask one further question: which gene identifier could I use to match different probe ids to the same transcript? Which the most reliable: Entrez-id?

ADD REPLY • link 6.8 years ago by rv ▴ 20

2

Entering edit mode

Well, annotations on these things are never easy and frequently provoke headaches.

For the the 133 versus the 95, Affymetrix already appear to have compared these and provided some information about 'best' and 'good' matching probes, which may be of use to you. Please take a look here: Human Genome U133 Set - Support Materials (you may need to register with Affymetrix and/or ThermoFisher)

However, I would download the CSV annotation files and manually annotate with HGNC, Ensembl, or Entrez ID (or anything else). The annotations are large and comprehensive but they read into R and make the annotation process more controllable than trying to link to online databases.

The annotation CSV for the 133A is listed as 'HG-U133A Annotations, CSV format, Release 36 (19 MB, 7/12/16)' on the same page to which I linked above. Here is the page for the 95.

Hope that this helps!

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k

1

Entering edit mode

Sure, it helps. Thank you very much!

ADD REPLY • link 6.8 years ago by rv ▴ 20