Hello, I've been given output from a custom human microarray created in 2002 which includes columns for unigene, locus, and gi. I've been using both the org.Hs.eg.db and BioMart libraries in an attempt to map ANY of these to a standard gene symbol. Out of the 8,100 probes, only 394 of the unigenes were mapped, as most of them have been deprecated. Only Some 400 of the locus IDs worked when I specified refseq_mrna as a filter in the ensembl BioMart. These all began with "NM_" but many others in the locust column start with "AV", "AK", "AA" etc. As I understand the "GI" field is an old GenBank gene identifier number, but I can't for the life of me find any programmatic way to get to this. What to do, Bioinformatics gurus? Here's some data:
unigene locus gi
Hs.339868 NM_003974 4503358
Hs.108854 AK024569 10436879
Hs.240457 NM_004584 4759021
Hs.179735 NM_005167 4885066
Hs.76728 AV724531 10829010
Hs.288061 AK025375 10437878
Hs.125307 AA836204 2910523
Hs.288061 BC002409 12803202
Hs.251653 AK026594 10439481
Hs.74621 U29185 2865216
NA BE899595 10367264
Hs.37617 AL532303 12795796
Hs.169824 NM_002258 4504878
Hs.89887 D38081 533325
No it's a primary key in NCBI genbank.
Which R package and filter can I use to query it?