Convert gene id's to gene symbol preserving gene id's in deseq2
0
0
Entering edit mode
14 months ago

Good evening,

I have a dds object with gene id's, and I need to convert them into gene symbols. The point is that some genes do not have a match and I don't want to lose them in the analysis. I used this procedure using Biomart:

listEnsembl()
ensembl <- useEnsembl(biomart = "genes")
datasets <- listDatasets(ensembl)

ensembl.con <- useMart("ensembl", dataset = 'hsapiens_gene_ensembl')

attr <- listAttributes(ensembl.con)
filters <- listFilters(ensembl.con)

t <- getBM(attributes = c('ensembl_gene_id','external_gene_name'),
      filters = "ensembl_gene_id",
      values = ensembl.ids$V1,
      mart = ensembl.con)

rownames(dds)  <- t$external_gene_name[match(rownames(dds), t$ensembl_gene_id)]

But this creates NA values if a gene does not have a match. I need to preserve gene_id's in the rows when there is no correspondence. So I will have some gene symbols and some gene id's in the rows where the symbols is not provided from Biomart (or does not exist at all for some genes).

How can I do that ?

Thank you for your time.

ensembl r DE deseq2 • 1.3k views
ADD COMMENT
0
Entering edit mode

Have you tried using a simple ifelse after processing the getBM result so you're evaluating the external_gene_name column and using the ensembl_gene_id column as the NA replacement?

ADD REPLY
0
Entering edit mode

I am a beginner in R and I am a bit stuck. I tried with this command, but the problem is that getBM() does not return anything if the match does not happen.

rownames(dds) <- ifelse(is.na(t), rownames(dds), t$external_gene_name)
ADD REPLY
0
Entering edit mode

Do not use a variable named t - t is a popular function in R and you'll end up having to use its fully qualified name if you need it, plus your code will confuse people.

You're on the right track. Essentially, you'll need to be sure that the result from getBM is in a data.frame. You can then add a new column (NOT rownames) based on the ifelse. Also, you need to check if t$external_gene_name is NA. Try this:

t_obj$identifier_to_use <- ifelse(is.na(t_obj$external_gene_name), t_obj$ensembl_gene_id, t_obj$external_gene_name)

## I don't understand why you're doing this rownames assignment, so this code follows your lead but does not endorse your usage
rownames(dds) <- t_obj$identifier_to_use
ADD REPLY
0
Entering edit mode

Thanks for the reply, I am doing the assignment because I have a dds object from deseq2, and on the object I have rownames as gene_id's. So I basically need to convert them and leave only gene symbols as rownames. I came up with a partial solution, which still gives me some other problem: rownames(dds) <- ifelse(rownames(dds) %in% t_obj$ensembl_gene_id, t_obj$external_gene_name,rownames(dds)).

ADD REPLY
0
Entering edit mode

That won't work. Try the solution I gave you. ifelse does not vectorize the way you're assuming it will.

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6