Hello everybody,
I performed a DGE analysis and now I am trying to convert the gene ID to gene names.
The problem is that virtually all of my gene IDs look somewhat like this:
ENSG00000000003.15_4
.
In the list of gene IDs which correspond to the gene name, the gene IDs have this format:
ENSG00000000003.15
I used kallisto to quantify my reads and the genome as well as the GTF file from GENCODE (https://www.gencodegenes.org/human/release_33lift37.html)
Can I just delete everything after the _
and proceed? And what is the meaning of the underscore in the gene ID in general?
Any help is appreciated! Thanks!
Hmm, never came across underscores using GENCODE. Maybe this has to do with the lift from hg38 to hg19. I personally even remove the version numbers, so this probably does not do any harm.
Indeed, it is not part of the official format specification: https://www.ensembl.org/Help/Faq?id=488
You could contact the Ensembl Help Desk.