Question

biomaRt vs Org.hs.eg.db for Gene Annotation After Differential Expression

0

Entering edit mode

23 months ago

turcoa1 • 0

I completed a differential gene expression analysis and I essentially have close to 3000 genes that pass all filters (adj p-value and log2FC). I want to run some sort of downstream analysis (GSEA or something else) on these genes so I have tried annotating the genes with both biomaRt and org.hs.eg.db. In both cases, the genes that cannot be annotated are described as "novel transcripts" and "pseudogenes", and "antisense". There are about 700/3000 genes that have these descriptions and I am wondering if there is any way to resolve this. Using biomaRt improved the number of genes with annotation but there are still many that cannot be annotated. Should I throw away these genes to make downstream analysis easier? What if I throw away something important? Is this too many genes to throw away? I am stuck because I cannot seem to find a way to recover any more genes with annotation. I am using the correct reference chromosome (GRCh38.p13) and my data is in the form of ensemble ID's so using biomaRt should give me the most annotations, but it does not. Attached is a photo of the description of some of these genes that cannot be annotated. What should I do?

Annotation biomaRt • 1.1k views

ADD COMMENT • link updated 23 months ago by Jean-Karim Heriche 27k • written 23 months ago by turcoa1 • 0

2

Entering edit mode

It makes no difference I think if you toss or not. Any enrichment analysis is focused on known pathways, and almost all genes in known pathways (such as REACTOME or KEGG) are protein-coding, so you anyway won't get much meaning out of these "exotic" types of genes, such as pseudogenes and antisense. One often simply has no idea what they do.

ADD REPLY • link 23 months ago by ATpoint 88k

score 0 · Answer 1 · 2023-07-06

0

Entering edit mode

23 months ago

Jean-Karim Heriche 27k

What should I do?

Experiments to characterize the genes that have unknown functions. You could direct your experimental work for example by clustering the genes and guessing what process(es) they may be involved in by which annotated genes are present in the same cluster.

ADD COMMENT • link 23 months ago by Jean-Karim Heriche 27k