Convert Human to Mouse Symbols
3
2
Entering edit mode
17 months ago
cthangav ▴ 110

I'm trying to create a working function that takes a column of human gene symbols as input and outputs a vector of mouse gene symbols that is the same length. (I'm trying to use the function to replace the human genes in a dataframe with mouse genes)

I have tried this function using biomart and tried two mirrors but I'm getting connectivity issues.

convertHumanGeneList <- function(x){

  library("biomaRt")
  human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl", host="useast.ensembl.org")
  mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl", host="useast.ensembl.org")

  genesV2 <- getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol", 
                    values = x , mart = human, attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=T)

  humanx <- unique(genesV2[, 2])

  return(humanx)
}

I've also tried using this function, which works for some simple vectors but not longer ones:

mouse_human_genes = read.csv("http://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt",sep="\t")

convert_human_to_mouse <- function(gene_list){

  output = c()

  for(gene in gene_list){
    class_key = (mouse_human_genes %>% filter(Symbol == gene & Common.Organism.Name=="human"))[['DB.Class.Key']]
    if(!identical(class_key, integer(0)) ){
      human_genes = (mouse_human_genes %>% filter(DB.Class.Key == class_key & Common.Organism.Name=="mouse, laboratory"))[,"Symbol"]
      for(human_gene in human_genes){
        output = append(output,human_gene)
      }
    }
  }

  return (output)
}

> mouse_symbols <- convert_human_to_mouse(human_symbols)
There were 14 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In DB.Class.Key == class_key :
longer object length is not a multiple of shorter object length

If I try to use this function to replace genes in my dataframe, I get:

Error in `$<-.data.frame`(`*tmp*`, TG, value = c("Trim71", "Dppa4", "Sfrp2",  : 
  replacement has 3882 rows, data has 3957

(probably because it's not able to convert all the human genes to mouse)

biomart R gene-symbols • 8.7k views
ADD COMMENT
7
Entering edit mode
17 months ago
fracarb8 ★ 1.7k

Without using convoluted loops, and using your list of genes (HOM_MouseHumanSequence.rpt) you can

mouse_human_genes <- read.csv("http://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt",sep="\t"

# separate human and mouse 
mouse <- split.data.frame(mouse_human_genes,mouse_human_genes$Common.Organism.Name)[[2]]
human <- split.data.frame(mouse_human_genes,mouse_human_genes$Common.Organism.Name)[[1]]

# remove some columns
mouse <- mouse[,c(1,4)]
human <- human[,c(1,4)]

# merge the 2 dataset  (note that the human list is longer than the mouse one)
mh_data <- merge.data.frame(mouse,human,by = "DB.Class.Key",all.y = TRUE) 

> head(mh_data)
    DB.Class.Key Symbol.x Symbol.y
1     44220139    Wdr53    WDR53
2     44220140      Tdg      TDG
3     44220141   Trarg1   TRARG1
4     44220142    Pdgfb    PDGFB
5     44220143   Gpr171   GPR171
6     44220144  Glyatl3  GLYATL3
...
ADD COMMENT
5
Entering edit mode
17 months ago
Ming Tommy Tang ★ 4.5k

you can use this table https://gist.github.com/crazyhottommy/4e46298045a329b47669

ADD COMMENT
0
Entering edit mode

Thank you, this works for the most part. I noticed this doesn't have the gene Pou5f1 / also known as Oct4, is that correct?

ADD REPLY
5
Entering edit mode
17 months ago
Nitin Narwade ★ 1.6k

I am not sure where I found this code but I am using this since very long for the gene symbol conversion. If I find the original source of this code I will update my answer for acknowledgment, but you can use this one and it is working quite well for me,

convert_mouse_to_human <- function(gene_list) { 
     output = c()
     mouse_human_genes = read.csv("https://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt",sep="\t")

     for(gene in gene_list) {
          class_key = (mouse_human_genes %>% filter(Symbol == gene & Common.Organism.Name == "mouse, laboratory"))[['DB.Class.Key']]
          if( !identical(class_key, integer(0)) ) {
               human_genes = (mouse_human_genes %>% filter(DB.Class.Key == class_key & Common.Organism.Name=="human"))[,"Symbol"]
               for(human_gene in human_genes) {
                    output = rbind(c(gene, human_gene), output)
               }
          }
     }
     return (output)
}

convert_human_to_mouse <- function(gene_list) {
    output = c()
    mouse_human_genes = read.csv("https://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt",sep="\t")

    for(gene in gene_list) {
          class_key = (mouse_human_genes %>% filter(Symbol == gene & Common.Organism.Name == "human"))[['DB.Class.Key']]
          if( !identical(class_key, integer(0)) ) {
            human_genes = (mouse_human_genes %>% filter(DB.Class.Key == class_key & Common.Organism.Name=="mouse, laboratory"))[,"Symbol"]
            for(human_gene in human_genes) {
                output = rbind(c(gene, human_gene), output)
            }
          }
     }
     return (output)
}

It uses orthologous gene pairs for mouse and human provide by MGI database.

NOTE: The same strategy has been used elsewhere and implemented in python, here you can find the code

All the best :)

Regards,

Nitin N.

ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6