Dear all,
I want to work with RNA-seq TCGA data and as I am working with a list with genes of interests that is annotated based on the latest update of HGNC (04.06.2024), I wanted to do the same with the TCGA gene names. However, when I do this (using the ensembl gene ID), there are roughly 3,800 genes that I cannot match. I also tried to match the names but there are even more genes that do not match.
I am still a beginner in bioinformatics and I would be greatful for any tips or suggestions on how to annotate/up-date the TCGA gene names!
Thank you!
Best,
Ivana
What do you mean by "cannot match"? Can you give us an example?
This is a classic bioinformatics question, and there are no standard way to do so. You are balancing your mappings between FPs and FNs.
I normally ensemble all the following mappings
And then you can setup a rule. My rule is if a mapping is not unique, I will manually inspect it.