in my current list, I have 8. I am trying to make my lookup as robust as possible. Basically I am trying to map human genes to other mammalian genomes based on a list of genes provided by a biologist.
Hi caddymob
In R you can use SQL directly on the annotation databases to do this. Using your gene aliases as examples:
# load the annotation database
library(org.Hs.eg.db)# set up your query genes
queryGeneNames <- c('WHRN', 'SANS')# use sql to get alias table and gene_info table (contains the symbols)# first open the database connection
dbCon <- org.Hs.eg_dbconn()# write your SQL query
sqlQuery <- 'SELECT * FROM alias, gene_info WHERE alias._id == gene_info._id;'# execute the query on the database
aliasSymbol <- dbGetQuery(dbCon, sqlQuery)# subset to get your results
result <- aliasSymbol[which(aliasSymbol[,2]== queryGeneNames),5]
result
[1]"DFNB31""USH1G"
See the AnnotationDBI docs for more information.
Best
d
ADD COMMENT
• link
updated 6.6 years ago by
Ram
45k
•
written 13.4 years ago by
Duff
▴
670
1
Entering edit mode
I like this approach! I actually modified my script to first check my total list of genes for synonyms using the AnnotationDBI, then procede with the rest of my stuff in biomaRt using the approved gene name. Thanks duff!
Actually - after thinking the flat file might be the only solution in response to Larry, I was poking around genenames.org and found a way to do this.. When you use the cgi download page, you get a link. Rather than saving this as a text file, I feed this right into R:
With just 8 and in terms of getting the result out the door, I would query manually at NCBI under the HomoloGene site. You'll have to perform each search individually, but aliases are accepted most of the time. If you're unsure of aliases being an acceptable query, then search EntrezGene with the human alias and limit the search to human genes.
Yea - there are multiple ways of getting these manually. UCSC gene search will get them too, as will wikigenes and genenames.org. But again, I'd really like to do this programmatically rather than manually...
I strongly encourage you to use a solution from this thread or maybe https://www.biostars.org/p/126277/ rather than random files people upload (without code) in dropboxes. Not saying this one here is wrong, it is just not reproducible without code and therefore has limited value imho, again not saying it is wrong or not well done, just not reliable.
How many gnes are in your list and for how many organisms are you collecting the symbol-alias pairings?
in my current list, I have 8. I am trying to make my lookup as robust as possible. Basically I am trying to map human genes to other mammalian genomes based on a list of genes provided by a biologist.