Hello everyone, I am trying to elaborate cel files from chips hgu133a and hgu95a.
When I search for common genes
library('affy');
dataA <- ReadAffy ..
dataB <- ReadAffy ..
a <- featureNames(dataA);
b <- featureNames(dataB);
c <- intersect(a, b);
length(c);
I get only 312 probe sets.
Could you give me any suggestion or explanation? Is there any reason for such a relevant difference?
Thank you Kevin for your answer. Yes, I suspected something like this. If I can ask one further question: which gene identifier could I use to match different probe ids to the same transcript? Which the most reliable: Entrez-id?
Well, annotations on these things are never easy and frequently provoke headaches.
For the the 133 versus the 95, Affymetrix already appear to have compared these and provided some information about 'best' and 'good' matching probes, which may be of use to you. Please take a look here: Human Genome U133 Set - Support Materials (you may need to register with Affymetrix and/or ThermoFisher)
However, I would download the CSV annotation files and manually annotate with HGNC, Ensembl, or Entrez ID (or anything else). The annotations are large and comprehensive but they read into R and make the annotation process more controllable than trying to link to online databases.
The annotation CSV for the 133A is listed as 'HG-U133A Annotations, CSV format, Release 36 (19 MB, 7/12/16)' on the same page to which I linked above. Here is the page for the 95.
Hope that this helps!
Sure, it helps. Thank you very much!