Hi all,
I used DESeq2 to do Pseudobulk analysis on my Seurat object. I have a problem converting gene names to Ensembl IDs. My row names are, some with ENSG, some with gene names. I want to have Ensembl IDs and chromosome names as well. Here is the part of my DESeq code for Pseudobulk analysis:
dds <- DESeqDataSetFromMatrix(countData = counts_bcell,
colData = colData,
design = ~Age+Sex+condition)
#filter
keep <- rowSums(counts(dds)) >=10
dds <- dds[keep,]
colData(dds)$condition <- relevel(colData(dds)$condition, ref = "Control")
#run DESeq2
dds <- DESeq(dds, test = "LRT", reduced = ~Age+Sex)
#check the coefficients for the comparison
resultsNames(dds)
#Generate result object
res <- results(dds, name = "condition_Patient_vs_Control")
mapped <- data.frame(GeneName = rownames(res),
ensemblID = mapIds(org.Hs.eg.db, keys =rownames(res), keytype = "SYMBOL", column="ENSEMBL"))
res$ensembl_gene_id <- mapped$ensemblID
If we look at mapped it looks like as below for the gene names with ENSG I don't get any ensemblID.
> mapped
GeneName ensemblID
ENSG00000238009 ENSG00000238009 <NA>
ENSG00000241860 ENSG00000241860 <NA>
ENSG00000290385 ENSG00000290385 <NA>
ENSG00000291215 ENSG00000291215 <NA>
ENSG00000229905 ENSG00000229905 <NA>
LINC01409 LINC01409 <NA>
ENSG00000290784 ENSG00000290784 <NA>
FAM87B FAM87B ENSG00000177757
LINC00115 LINC00115 <NA>
Any suggestions, please, or a better way to add ensemblID and chromosome name and biotype?
I appreciate your help. Many thanks!
I used Seurat, and in Seurat, I have gene names (which some are with gene-symbols and some with ENSG ids). Then I did Pseudobulk. how can I convert them or add ENS IDs as alternatives in another column in Seurat?
Why is your
GeneName
column in mapped a mix of Ensembl IDs and gene names? What Jared wants to say is that during the preprocessing you should already have made sure that you only have a constant identifier (Ensembl IDs) present, and not this wild mix. From a constant identifier it is easy to convert, e.g. by loading a GTF file that contains both ID and name, and then just do a left join with that.This is exactly my point. You should not have gotten data to this state unintentionally, so you need to double-check what was done upstream to see where the swaps occurred and rectify it at that point.
It was 10X data, and I processed it using Seurat. Then, I came to the point of Pseudobulk using DESeq2. Does this mean I have to check which parameters they used in Cellranger, or do I have to check/change something in my Seurat analysis?