I'm trying to create a working function that takes a column of human gene symbols as input and outputs a vector of mouse gene symbols that is the same length. (I'm trying to use the function to replace the human genes in a dataframe with mouse genes)
I have tried this function using biomart and tried two mirrors but I'm getting connectivity issues.
convertHumanGeneList <- function(x){
library("biomaRt")
human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl", host="useast.ensembl.org")
mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host="useast.ensembl.org")
genesV2 <- getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol",
values = x , mart = human, attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=T)
humanx <- unique(genesV2[, 2])
return(humanx)
}
I've also tried using this function, which works for some simple vectors but not longer ones:
mouse_human_genes = read.csv("http://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt",sep="\t")
convert_human_to_mouse <- function(gene_list){
output = c()
for(gene in gene_list){
class_key = (mouse_human_genes %>% filter(Symbol == gene & Common.Organism.Name=="human"))[['DB.Class.Key']]
if(!identical(class_key, integer(0)) ){
human_genes = (mouse_human_genes %>% filter(DB.Class.Key == class_key & Common.Organism.Name=="mouse, laboratory"))[,"Symbol"]
for(human_gene in human_genes){
output = append(output,human_gene)
}
}
}
return (output)
}
> mouse_symbols <- convert_human_to_mouse(human_symbols)
There were 14 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In DB.Class.Key == class_key :
longer object length is not a multiple of shorter object length
If I try to use this function to replace genes in my dataframe, I get:
Error in `$<-.data.frame`(`*tmp*`, TG, value = c("Trim71", "Dppa4", "Sfrp2", :
replacement has 3882 rows, data has 3957
(probably because it's not able to convert all the human genes to mouse)
Thank you, this works for the most part. I noticed this doesn't have the gene Pou5f1 / also known as Oct4, is that correct?