read in the ref dataset

Question

What should I do if the gene IDs in my reference dataset and test dataset are different

0

Entering edit mode

23 months ago

paria ▴ 90

Hi everyone,

I am trying to annotate my single cell dataset (integrated dataset) using singleR. However, I am getting the error "no common genes between 'test' and 'ref'". I realized that the gene IDs are different in two datasets (gene symbol in test dataset and gene ensemble ID in ref). I tried to convert gene symbol to ensemble ID in my test dataset using Biomart. I did extracted the RNA slot as input for getBM which is data frame. But I am not sure how to integrate this matrix including ensemble gene ID to the original Seurat object. I cannot move forward with the new matrix I have because the input for singleR is sce object. Could anyone tell me what should I do in this situation? Below is the code I used to convert gene ID

ALS <- readRDS("seurat_clustered.rds")
DefualtAssay <- "RNA"
test_assay <- GetAssayData(ALS)
library("biomaRt")
ensembl <- useEnsembl(biomart = "genes")
ensembl =useDataset("hsapiens_gene_ensembl",mart=ensembl)
Gene <- rownames(test_assay)
lable <- getBM(attributes=c('ensembl_gene_id'), filters ='external_gene_name', values =Gene, mart =ensembl)
str(label)
'data.frame'

I have a data frame that I cannot convert to single cell experiment object which is the same format as ref dataset. I greatly appreciate any comment.

Paria

gene-ID RNA-seq R singleR • 2.5k views

ADD COMMENT • link 23 months ago by paria ▴ 90

score 3 · Answer 1 · 2023-01-20

3

Entering edit mode

23 months ago

FrankStarling ▴ 60

I would make the single cell experiment first then change from symbols to ensembl and remove any symbols that did not match. I don't really understand " I cannot convert to single cell experiment object which is the same format as ref dataset" comment. Something like this:

seurat_clustered.sce <- as.SingleCellExperiment(seurat_clustered)
require(EnsDb.Hsapiens.v86)
ens <- mapIds(EnsDb.Hsapiens.v86,
keys = rownames(seurat_clustered.sce),
column = 'GENEID',
keytype = 'SYMBOL')
all(rownames(seurat_clustered.sce) == names(ens))

keep <- !is.na(ens)
ens <- ens[keep]
seurat_clustered.sce <- seurat_clustered.sce[keep,]
rownames(seurat_clustered.sce) <- ens

ADD COMMENT • link 23 months ago by FrankStarling ▴ 60

0

Entering edit mode

Thanks for your help! Paria

ADD REPLY • link 23 months ago by paria ▴ 90

0

Entering edit mode

Here is the code that worked for me

read in the ref dataset

reference <- readRDS("DR.rds")

convert ensemble id to gene symbol, I found the best dataset is EnsDb.Hsapiens.v79

ens <- mapIds(EnsDb.Hsapiens.v79, keys =reference@assays$RNA@data@Dimnames[[1]], column ='SYMBOL', keytype ='GENEID')

length(ens)==length(reference@assays$RNA@data@Dimnames[[1]]) #it shoud Return 'TRUE'

remove NA values from ens

keep <- !is.na(ens) ens <- ens[keep]

replace the gene name in the seurat object

reference <- reference[keep,] reference@assays$RNA@data@Dimnames[[1]]=ens names(reference@assays$RNA@data@Dimnames[[1]])=c()

ADD REPLY • link 23 months ago by paria ▴ 90