Hi there,
I am using bioMart to convert some gene IDs from mouse to human for some data I generated through RNA-seq. I am currently mapping using the following function:
convertMouseGeneList <- function(x){
require("biomaRt")
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = x , mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T)
humanx <- unique(genesV2[, 2])
# Print the first 6 genes found to the screen
print(head(humanx))
return(humanx)
}
This works, but doesn't map everything. I currently have a dataframe of 53569 genes, and I want to map as many of the mouse to human genes (I want to put this through a bulk deconvolution package that has a human dataset). So I am currently pulling out the genes from the dataframe into a list, and attempting to convert that. However, only 18411 genes are returned. I would like to replace these genes with their orthologs, but keep the other genes in that same dataframe, how would I do that here?
Alternatively, I could also create a new dataframe with only the mapped genes, but I would like to map it to the original genes so that I can retain the expression counts from the samples for the right gene that has been mapped from mouse to human. Any ideas on how I can achieve that?
Thanks!
Tom
You should provide an example of what your expected output would be. It is extremely unclear from your post which pieces of data you want to keep in your final dataframe.
You can likely use functions such as
subset()
or the%in%
operator to create the right data.frames.What I want to do is take the list of mouse genes e.g [mousegene1, mousegene2, mousegene3, mousegene4, mousegene5] and map any of them to human genes, but still retain the ones that don't map in their order, e.g [humanmappedgene1, mousegene2, humanmappedgene3, humanmappedgene4, mousegene5]. This is so I can map the input gene list directly back into the dataframe they are extracted from. The order is important as in the dataframe, mousegene1 has expression info for the samples e.g sample 1, sample 2 and sample 3 and I want to retain the expression level information for each gene that was mapped.