Question

What to do when having differenet versions of ensembl IDs?

0

Entering edit mode

3.7 years ago

Omar Mohamed • 0

This is a general question, but I want to know the best practice here. Sometimes, when I have an RNA-seq data, the row names is represented as ensembl IDs which is not very meaningful to me. When I try to map the rownames to gene symbols, I got and error that rownames cannot contain duplicated enteries. So, many genes has different versions (Which I don't totally understand how aligners uses multiple versions)

So, what should I do about that? I think If I kept only one of them or the most varying before normalization and clustering check, I would be biasing the analysis as I ignored few counts!

What should I do?

R Bioconductor RNA-Seq • 665 views

ADD COMMENT • link updated 3.7 years ago by rpolicastro 13k • written 3.7 years ago by Omar Mohamed • 0

score 0 · Answer 1 · 2021-03-13

0

Entering edit mode

3.7 years ago

rpolicastro 13k

I keep all gene IDs regardless of whether they map to multiple gene names. If I want to display gene names for some visualization or data presentation (such as a list of DEGs) I'll merge the gene names into the matching gene IDs.

ADD COMMENT • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

that's logical so far, but what if one gene of interrest is not matching another version. like one over expressed and other is not or also overexpressed but with different values?

ADD REPLY • link 3.7 years ago by Omar Mohamed • 0

0

Entering edit mode

You can report the gene IDs alongside the gene names.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Pretty neat idea. Thank you for sharing

ADD REPLY • link 3.7 years ago by Omar Mohamed • 0