Entering edit mode
4.5 years ago
Agustin Gonzalez-Vicente
▴
80
What are the recommended strategies and packages to reduce RNAseq data from ENSEMBL gene IDs to protein coding genes with a valid NCBI.ID? I normally use the package annotables and then filter for “protein_coding” and !is.na(entrez). After that I come up with a few duplicated entrez IDs, what’s the best way to deal with those?
See this previous discussion: A: How to deal with the case that one gene symbol matches multiple ensembl ids?
That is a different question and a different answer than I would give. The OP wants to know how to deal with Ensembl IDs that map to two or more NCBI Gene IDs, neither of which are as, um, unreliable as a HUGO Gene Symbol.
I tend to favor sticking with either Ensembl IDs or NCBI Gene IDs, and ignoring the differences between the two - there's no profit in trying to figure out why a given Ensembl ID maps to two NCBI Gene IDs or vice versa. Just use one or the other and be done with it.
thanks, I normally prefer entrez and convert to gene names at the very end to make sense of it or when a tool only accepts HUGO.
Originally posted on Bioconductor https://support.bioconductor.org/p/131622/