I am trying to get gene symbols for gene ids that I got for mouse datasets. Gene ids look like that: 0610009B22Rik
. The code that I am trying to utilize is the following one:
ensembl <- useMart("ensembl", dataset="mmusculus_gene_ensembl")
mouse_gene_ids <- dataset[, 1]
foo <- getBM(attributes=c('ensembl_gene_id',
'external_gene_name'),
filters = 'genedb',
values = mouse_gene_ids,
mart = ensembl)
I am getting zero
results as an output after the query runs. I guess filters
parameter is wrong. Any suggestions would be greatly appreciated.
output:
Why can't I get any gene symbol
Perhaps you want to try the
entrezgene_id
filter instead?Yes. I tried it and it works.
Thank you so much for the help!
Hello, I tried to follow the previous posts and actually everything worked but I did not get anything back as result. My code below:
library(biomaRt) ensembl <- useMart("ensembl",dataset="mmusculus_gene_ensembl") genes_ids <- c('ENSMUSG00000051951.5', 'ENSMUSG00000025900.12', 'ENSMUSG00000025902.13') gs_heatdata <- getBM(attributes = c("external_gene_name"), filters = "mgi_symbol", values = genes_ids, mart = ensembl)
Hi, you need to remove the trailing numbers from the gene IDs. Also, the value for
filters
should be ensembl_gene_id. Please try this:it works perfectly but I did not understand how you managed it: - the trailing number stands for the 0s before the actual id? - could you explain me in particular what
sub('\\.[0-9]*$', '',
refers to? thank you a lot!That is a regular expression saying that
sub
stitute anything including a period and any number(s) between 1 and 9 with nothing (i.e. delete).sorry I forgot one more question. How can I make the code "cleaner"? because the output in the end shows me two features that are the same, the 'external_gene_name' and 'mgi_symbol'.
Thank you!
Change following line
to
Or keep
mgi_symbol
if you want to keep that instead.I tried with my all dataset but it did not work. I just have in return the empty table with the external_gene_name and ensembl_gene_id as headers.
Hi, the converted IDs are contained in
gs_heatdata
. You then have to align these to the rownames ofheatdata
, and then replace them with the external gene IDs (MGI symbols).Hi, how can I align them? which function should I use? how can I then replace them with the external gene IDs? should I first convert the
row.names
ofheatdata
in the first column and then somehow combine the dfgs_heatdata
with the dfheatdata
? thank you a lot! :)Hi, please take a look at functions such as
which()
andmatch()
, and other functions from dplyr (package) for matching data-frames.A quick example:
Hi, I tried for now with
match()
but I think it did not work.match()
returns the indices [inheatdata
] of the elements ofgs_heatdata
What you likely need is:
ok, I try this. Just for me to understand: can I also just use the previous
genes_ids
or I have to put the entiresub('\\.[0-9]*$', '', rownames(heatdata))
inmatch()
and afterall()
? thank you!!It returned this:
I think I found a problem and it was quite in front of me. the filters set were wrong. I had to use
filters = "ensembl_gene_id"
instead offilters = "mgi_symbol"
. now thegs_heatdata
looks good:but if I proceed with the previous code I get anyway
NA
: