This is a general question, but I want to know the best practice here. Sometimes, when I have an RNA-seq data, the row names is represented as ensembl IDs which is not very meaningful to me. When I try to map the rownames to gene symbols, I got and error that rownames cannot contain duplicated enteries. So, many genes has different versions (Which I don't totally understand how aligners uses multiple versions)
So, what should I do about that? I think If I kept only one of them or the most varying before normalization and clustering check, I would be biasing the analysis as I ignored few counts!
What should I do?
that's logical so far, but what if one gene of interrest is not matching another version. like one over expressed and other is not or also overexpressed but with different values?
You can report the gene IDs alongside the gene names.
Pretty neat idea. Thank you for sharing