I have added gene name and Entrez ID in my DESeq2 result. The commands I have used are:
res$hgnc_symbol <- convertIDs(gsub("\\..*","", row.names(res)), "ENSEMBL", "SYMBOL", org.Hs.eg.db)
res$entrezgene <- convertIDs(gsub("\\..*","", row.names(res)), "ENSEMBL", "ENTREZID", org.Hs.eg.db)
resOrdered <- res[order(res$pvalue),]
After, checking the object resOrdered:
head(resOrdered)
I am getting like this:
gene_id stat pvalue padj
<numeric> <numeric> <numeric>
ENSG00000280228.1 -5.9096792673878 3.42774467723773e-09 1.53944643600176e-05
ENSG00000225555.1 -5.88657749721615 3.94274922781857e-09 1.53944643600176e-05
ENSG00000234616.7 -5.77542188212913 7.67605073778235e-09 1.99807600704475e-05
ENSG00000058866.13 -4.88480635581578 1.03530552769589e-06 0.00163524543935493
ENSG00000180152.3 -4.8645382294294 1.14724332367248e-06 0.00163524543935493
ENSG00000244968.5 -4.84652440600431 1.25643137868224e-06 0.00163524543935493
hgnc_symbol entrezgene
<character> <character>
ENSG00000280228.1 NA NA
ENSG00000225555.1 NA NA
ENSG00000234616.7 JRK 8629
ENSG00000058866.13 DGKG 1608
ENSG00000180152.3 NA NA
ENSG00000244968.5 LIFR-AS1 100506495
Here, I am not getting some gene names and Entrez ID and it is showing NA.
I have aligned my data to GRCh38 (took GTF from same assembly) with STAR and count file were created using htseq-count.
What could be the reason of these. Please put your valuable suggestion how I should go forward with this?
Did you check your GTF file? Instead of using
gene_id
as identifier, you need to usegene_name
to get the desired output (gene names) when you do the counting. No mapping would be needed in that case. See an example line of annotation below.gene_id "ENSG00000223972.5" = gene_name "DDX11L1"
gene_id "ENSG00000280228.1"= gene_name "AC079753.1"
Thank you so much. If I'm not wrong then org.Hs.eg.db package has no record for these IDs. The gene name exist in the GTF files when I checked it manually. Yes that's a great idea. I should Use gene_name instead of id while counting with htseq.