I'm interested in comparing data from two chips - one is a hgu133plus2 chip and the other is an older hgu133a chip. From the manufacturer's website, it states that the newer chip is the "Human Genome U133 Set plus 6,500 additional genes". Looking at the annotations from bioconductor, I get the following:
> library('hgu133a.db')
> library('hgu133plus2.db')
> a <- names(as.list(hgu133aCHR))
> b <- names(as.list(hgu133plus2CHR))
> c <- a%in%b
> sum(!c)
[1] 6
i.e. there are only 6 probesets that aren't in the newer chip but are in the older chip (I can live with 6). Is there a better way to get the probenames, btw?
How safe am I to assume that the probes that have the same name are actually pointing to the same probe over the two chips? Can I just remove all those probes that aren't on the older chip and then treat the older chip and the new chip (minus the non-common probes) as essentially the same chip?
Yes, you can treat probe sets with common names as interrogating the same thing. You can confirm this by accessing the list of probes in a given probe set, and then comparing the sequences of those probes. Of course, there may be slight differences in hybridization properties based on the x-y coordinates on the array, but I'd be pretty confident those effects will be pretty small compared to the batch effects comparing arrays that (presumably) were done at different times (and possibly by different labs).
thanks! I've checked the RefSeq property matches up between the common probesets in each library but not the actual sequences. How would I access the sequences, independently of the annotations?
just to chime in, affymetrix has a policy of unique probe sets names and common probe set names are identical. the best way to check this out for ivt arrays is to download the target fasta file and see if they are the same. or better yet, check the probe sequences. you will find them identical.
thanks! I've checked the RefSeq property matches up between the common probesets in each library but not the actual sequences. How would I access the sequences, independently of the annotations?
for the plus2 chip, loading the library "hgu133plus2probe" will populate a variable with that same name with six columns, including the sequence, x, y, probe set name. you can see a bit more in http://bioconductor.org/packages/release/data/annotation/manuals/hgu133plus2probe/man/hgu133plus2probe.pdf. The other relevant packages are listed at http://bioconductor.org/packages/release/data/annotation/. Hope that helps!