Hi everybody, I have a problem with my Ensembl ID after running DESEq2 (I'm using hg38 genome):
dds <- DESeq(ds_matrix)
res <- results(dds)
baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000176124.11 168.67880 4.991104 0.2797296 17.842601 3.299728e-71 6.057971e-67
As you could see, the identifiers are ENSG00000176124.11, for example, so when I've tried to annotate the genes using,
library("AnnotationDbi")
library("org.Hs.eg.db")
res$symbol <- mapIds(org.Hs.eg.db,
keys = row.names(res),
column = "SYMBOL",
keytype = "ENSEMBL",
multiVals = "first")
or using gage, the ID with the dots and the number after it are not recognized and can be match. So does anyone know how to deal with this problem?
Thanks
Because the counts where done using salmon and the original files included the identifiers with the dot, so I don't know if I have to remove it from the original files or there is other way to do that... because the ID with the dot are not recognized.
It depends on how the salmon index was generated. Generally you can just strip the
.xx
extension from your IDs to make it work.keys = gsub("\\..*$", "",row.names(res)),
Hey, I ran into the same issue after also using salmon to quanitfy against a gencode index. This seemed to work for me, but a lot of the mapped ID's refer to genes with NA values. Is there a way to limit the analysis to well annotated genes? I'm not sure what to make of these differences exactly, seeing these huge fold changes but mostly for things that I don't know what they are.