Question

How to add Ensembl ids after Pseudobulk analysis by DESeq2

0

Entering edit mode

14 months ago

Sara ▴ 40

Hi all,

I used DESeq2 to do Pseudobulk analysis on my Seurat object. I have a problem converting gene names to Ensembl IDs. My row names are, some with ENSG, some with gene names. I want to have Ensembl IDs and chromosome names as well. Here is the part of my DESeq code for Pseudobulk analysis:

dds <- DESeqDataSetFromMatrix(countData = counts_bcell,
                              colData = colData,
                              design = ~Age+Sex+condition)

#filter
keep <- rowSums(counts(dds)) >=10
dds <- dds[keep,]

colData(dds)$condition <- relevel(colData(dds)$condition, ref = "Control")

#run DESeq2
dds <- DESeq(dds, test = "LRT", reduced = ~Age+Sex)


#check the coefficients for the comparison
resultsNames(dds)

#Generate result object
res <- results(dds, name = "condition_Patient_vs_Control")

mapped <- data.frame(GeneName = rownames(res),
                     ensemblID = mapIds(org.Hs.eg.db, keys =rownames(res), keytype = "SYMBOL", column="ENSEMBL"))

res$ensembl_gene_id <- mapped$ensemblID

If we look at mapped it looks like as below for the gene names with ENSG I don't get any ensemblID.

> mapped
                       GeneName       ensemblID
ENSG00000238009 ENSG00000238009            <NA>
ENSG00000241860 ENSG00000241860            <NA>
ENSG00000290385 ENSG00000290385            <NA>
ENSG00000291215 ENSG00000291215            <NA>
ENSG00000229905 ENSG00000229905            <NA>
LINC01409             LINC01409            <NA>
ENSG00000290784 ENSG00000290784            <NA>
FAM87B                   FAM87B ENSG00000177757
LINC00115             LINC00115            <NA>

Any suggestions, please, or a better way to add ensemblID and chromosome name and biotype?

I appreciate your help. Many thanks!

Seurat Pseudobulk single-cell DESeq2 scRNA • 1.3k views

ADD COMMENT • link 14 months ago by Sara ▴ 40

score 0 · Answer 1 · 2024-05-26

0

Entering edit mode

14 months ago

jared.andrews07 ★ 19k

Go back to your original counts matrix or input data and assign consistent IDs during its generation.

ADD COMMENT • link 14 months ago by jared.andrews07 ★ 19k

0

Entering edit mode

I used Seurat, and in Seurat, I have gene names (which some are with gene-symbols and some with ENSG ids). Then I did Pseudobulk. how can I convert them or add ENS IDs as alternatives in another column in Seurat?

ADD REPLY • link 14 months ago by Sara ▴ 40

0

Entering edit mode

Why is your GeneName column in mapped a mix of Ensembl IDs and gene names? What Jared wants to say is that during the preprocessing you should already have made sure that you only have a constant identifier (Ensembl IDs) present, and not this wild mix. From a constant identifier it is easy to convert, e.g. by loading a GTF file that contains both ID and name, and then just do a left join with that.

ADD REPLY • link 14 months ago by ATpoint 88k

0

Entering edit mode

This is exactly my point. You should not have gotten data to this state unintentionally, so you need to double-check what was done upstream to see where the swaps occurred and rectify it at that point.

ADD REPLY • link 14 months ago by jared.andrews07 ★ 19k

0

Entering edit mode

It was 10X data, and I processed it using Seurat. Then, I came to the point of Pseudobulk using DESeq2. Does this mean I have to check which parameters they used in Cellranger, or do I have to check/change something in my Seurat analysis?

ADD REPLY • link 14 months ago by Sara ▴ 40