Annotation best practices

0

Entering edit mode

4.5 years ago

Agustin Gonzalez-Vicente ▴ 80

What are the recommended strategies and packages to reduce RNAseq data from ENSEMBL gene IDs to protein coding genes with a valid NCBI.ID? I normally use the package annotables and then filter for “protein_coding” and !is.na(entrez). After that I come up with a few duplicated entrez IDs, what’s the best way to deal with those?

RNA-Seq gene • 951 views

ADD COMMENT • link 4.5 years ago by Agustin Gonzalez-Vicente ▴ 80

1

Entering edit mode

See this previous discussion: A: How to deal with the case that one gene symbol matches multiple ensembl ids?

ADD REPLY • link 4.5 years ago by GenoMax 147k

1

Entering edit mode

That is a different question and a different answer than I would give. The OP wants to know how to deal with Ensembl IDs that map to two or more NCBI Gene IDs, neither of which are as, um, unreliable as a HUGO Gene Symbol.

I tend to favor sticking with either Ensembl IDs or NCBI Gene IDs, and ignoring the differences between the two - there's no profit in trying to figure out why a given Ensembl ID maps to two NCBI Gene IDs or vice versa. Just use one or the other and be done with it.

ADD REPLY • link 4.5 years ago by James W. MacDonald ▴ 20

0

Entering edit mode

thanks, I normally prefer entrez and convert to gene names at the very end to make sense of it or when a tool only accepts HUGO.