My honest recommendation to you, coming from a person who has analysed dozens of different types of microarrays, is to use the annotation provided by the manufacturer (in this case, Affymetrix), and to do the annotation conversion manually. The programs that attempt to do automated conversion between annotations are frequently out of date and cumbersome to use, as you have found. The manufacturer's annotation is always the most comprehensive and most updated.
The exact file that you need for the HG-U219 is here:
http://www.affymetrix.com/support/technical/byproduct.affx?product=HG-U219
Look for 'HG-U219 Annotations, CSV format, Release 36 (38 MB, 4/13/16)' - you may need to register in order to download it.
----------------------
The extracted ZIP file is large but it loads into R, where you can easily do the mappings using match()
or which()
. These annotation files have headers that start with hash (#), like this:
##For information about the Annotation file content
#%create_date=2016-03-30 GMT-08:00 16:43:06
#%chip_type=HG-U219
#%genome-species=Homo sapiens
#%genome-version=hg19
#%genome-version-ucsc=hg19
#%genome-version-ncbi=GRCh37
#%genome-version-create_date=2009-02-00
#%ensembl-date=2015-11-11
#%ensembl-version=82
...
The remainder is then 'shockingly' comprehensive. Here is just a snapshot:
Probe Set ID UniGene ID Gene Title Gene Symbol Location Ensembl
11715100_at Hs.247813 histone cluster 1, H3g HIST1H3G chr6p22.2 ENSG00000273983
11715101_s_at Hs.247813 histone cluster 1, H3g HIST1H3G chr6p22.2 ENSG00000273983
11715102_x_at Hs.247813 histone cluster 1, H3g HIST1H3G chr6p22.2 ENSG00000273983
11715103_x_at Hs.465643 tumor necrosis factor TNFAIP8L1 chr19p13.3 ENSG00000185361
11715104_s_at Hs.352515 otopetrin 2 OTOP2 chr17q25.1 ENSG00000183034
11715105_at Hs.439154 chr17 ORF 78 C17orf78 chr17q12 ENSG00000278145
11715106_x_at Hs.450233 CTAGE family, member 15 CTAGE15 chr7q35 ENSG00000271079
11715107_s_at Hs.533543 coag. factor VIII F8A1 chrXq28 ENSG00000274791
11715108_x_at Hs.722466 linc RNA 1098 LINC01098 chr4q34.3 ENSG00000231171
11715109_at Hs.439922 sterile ... cont. 7 SAMD7 chr3q26.2 ENSG00000187033
11715110_at Hs.574574 arrestin domain cont. 5 ARRDC5 chr19p13.3 ENSG00000205784
11715111_s_at Hs.172944 chorionic gonado., beta CGB chr19q13.32 ENSG00000104818
11715112_at Hs.531182 glutamate rich 3 ERICH3 chr1p31.1 ENSG00000178965
11715113_x_at Hs.567527 fam 86, member C1 FAM86C1 chr11q13.4 ENSG00000158483
11715114_x_at Hs.567527 fam 86, member C1 FAM86C1 chr11q13.4 ENSG00000158483
...
Here is a list of all columns in the annotation:
- Probe Set ID
- GeneChip Array
- Species Scientific Name
- Annotation Date
- Sequence Type
- Sequence Source
- Transcript ID(Array Design)
- Target Description
- Representative Public ID
- Archival UniGene Cluster
- UniGene ID
- Genome Version
- Alignments
- Gene Title
- Gene Symbol
- Chromosomal Location
- Unigene Cluster Type
- Ensembl
- Entrez Gene
- SwissProt
- EC
- OMIM
- RefSeq Protein ID
- RefSeq Transcript ID
- FlyBase
- AGI
- WormBase
- MGI Name
- RGD Name
- SGD accession number
- Gene Ontology Biological Process
- Gene Ontology Cellular Component
- Gene Ontology Molecular Function
- Pathway
- InterPro
- Trans Membrane
- QTL
- Annotation Description
- Annotation Transcript Cluster
- Transcript Assignments
- Annotation Notes
Some probes still won't have any gene symbol (on this array, seems to be ~400, with 100 being control probes), but you can impute these with values from another column manually (e.g. Representative Public ID), preferably within the confines of R and not Excel, Excel for Mac, Libre/Open Office, or some other spreadsheet tool.
Kevin