Hello!
I am trying to get the gene names and additional features after DESeq2 of human RNA-seq data where I contrast 2 diseases with healthy controls. However, I am stuck with getBM (I am following a tutorial from some years ago and do not know if it is not too updated either...). This is my code:
dds <- DESeq(dds)
res <- results(dds)
res <- results(dds, contrast = c("disease", "LC", "Hc"))
res$ensembl <- sapply(strsplit(rownames(res), split="\\+" ), "[", 1 )
ensembl <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene_id", "hgnc_symbol", "chromosome_name"),
#filters = "ensembl_gene_id", # with filters it does not work
values = res$ensembl,
mart = ensembl)
idx <- match(res$ensembl, genemap$ensembl_gene_id)
res$entrez <- genemap$entrezgene_id[idx]
res$gene_name <- genemap$hgnc_symbol[idx]
res$chr <- genemap$chromosome_name[idx]
write.csv( as.data.frame(res), file="results.csv" )
The tutorial recommends this part:
First, we split up the rownames of the results object, which contain ENSEMBL gene ids, separated by the plus sign, +. The following code then takes the first id for each gene by invoking the open square bracket function "[" and the argument, 1.
res$ensembl <- sapply( strsplit( rownames(res), split="\+" ), "[", 1 )
But I see that the ENSEMBL names are ENSG00000281764.1, ENSG00000281299.1, and so on...???
I have also tried to change that part for res$ensembl <- rownames(res)
but no improvement...
Thank you so much for your comments!!!