I am using two gene expression datasets from an Affy U95Av2 platform and an Affy U133 Plus 2.0 platform. When I map the Affy probe names to HUGO gene names, there are thousands of genes which exist in the newer Affy U133 Plus 2.0 dataset while not in the old Affy U95Av2 dataset, which is something expected. But there are also 97 genes which exist in the old Affy U95Av2 platform while not in Affy U133 Plus 2.0 platform. I would not expect that because Affy U133 Plus 2.0 is a much newer platform and I would expect it to contain all genes that were measured by Affy U95Av2. What does that mean? Should I understand that those 97 gene measurements in the Affy U95Av2 platform were not reliable and that's why they don't exist in Affy U133 Plus 2.0? Here are those 97 genes:
"ACSL4" "ACSM2A" "AP3S1" "AQP7" "ARPC3" "ATF4" "ATP5H" "BAK1" "BAK1P1" "CBX1" "CCL15" "CELP" "CFHR3" "CHEK2" "CLCNKA" "COL8A1" "CS" "CXorf40B" "CYP2D6" "DDI2" "EIF3F" "EIF3IP1" "EIF5AL1" "FCGR2A" "FCGR3A" "GBX1" "GPX1" "HAVCR1" "HBZ" "HIST1H2AH" "HIST1H2AI" "HIST1H2BC" "HIST1H2BJ" "HIST1H4I" "HOXA9" "HSPB1" "IFNA14" "IGF2" "IL9R" "ITGA1" "KAT7" "KRT33A" "KRTAP26-1" "LDHA" "MAGEA12" "MAP2K4P1" "MIA" "MKRN3" "MROH7" "MSX2P1" "MT1A" "MT1B" "NDUFV2" "OPHN1" "OR7E24" "PARP4" "PCDHA12" "PCDHA13" "PCDHGA12" "PCDHGB4" "PINK1-AS" "PMS2P3" "PSMC6" "PSME2" "RAB13" "RCN1" "RNF216P1" "RNF5" "RPL10A" "RPL18" "RPL27" "RPL35" "RPL37" "RPLP1" "RPS15A" "RPS26" "RPS29" "RPS5" "RPS9" "RSC1A1" "S100A7" "SAA1" "SAA4" "SNX29" "SPRR2D" "TOMM40" "UBC" "UBE2E3" "UBE2S" "UGT2B7" "UQCRFS1" "UQCRH" "VDAC2" "VENTXP7" "VOPP1" "XCL2" "ZNF799"
I have just checked one of the genes in your list, CBX1, on NetAffx, and find that probeset HG-U133_PLUS_2:201518_AT is annotated as CBX1 by Affymetrix, so it is not missing. Bioconductor and Ensembl do their own annotations for Affymetrix arrays, and their annotations may differ somewhat from those produced by Affymetrix.
I have also checked for RPL27, and there are 2 probesets currently associated with this gene by Affymetrix, HG-U133_PLUS_2:200025_S_AT and HG-U133_PLUS_2:213642_AT .
I have used biomaRt for mapping, so not NetAffx. Here is my code:
I am a Computer Science major so I am not very knowledgeable about how those annotations are done, but isn't it interesting that probes are mapped to different genes by different tools?
Yes. The majority of probesets will be annotated the same by the different tools, but there are always some that are annotated differently.