Hi everyone,
I am trying to annotate my single cell dataset (integrated dataset) using singleR. However, I am getting the error "no common genes between 'test' and 'ref'". I realized that the gene IDs are different in two datasets (gene symbol in test dataset and gene ensemble ID in ref). I tried to convert gene symbol to ensemble ID in my test dataset using Biomart. I did extracted the RNA slot as input for getBM which is data frame. But I am not sure how to integrate this matrix including ensemble gene ID to the original Seurat object. I cannot move forward with the new matrix I have because the input for singleR is sce object. Could anyone tell me what should I do in this situation? Below is the code I used to convert gene ID
ALS <- readRDS("seurat_clustered.rds")
DefualtAssay <- "RNA"
test_assay <- GetAssayData(ALS)
library("biomaRt")
ensembl <- useEnsembl(biomart = "genes")
ensembl =useDataset("hsapiens_gene_ensembl",mart=ensembl)
Gene <- rownames(test_assay)
lable <- getBM(attributes=c('ensembl_gene_id'), filters ='external_gene_name', values =Gene, mart =ensembl)
str(label)
'data.frame'
I have a data frame that I cannot convert to single cell experiment object which is the same format as ref dataset. I greatly appreciate any comment.
Paria
Thanks for your help! Paria
Here is the code that worked for me
read in the ref dataset
reference <- readRDS("DR.rds")
convert ensemble id to gene symbol, I found the best dataset is EnsDb.Hsapiens.v79
ens <- mapIds(EnsDb.Hsapiens.v79, keys =reference@assays$RNA@data@Dimnames[[1]], column ='SYMBOL', keytype ='GENEID')
length(ens)==length(reference@assays$RNA@data@Dimnames[[1]]) #it shoud Return 'TRUE'
remove NA values from ens
keep <- !is.na(ens) ens <- ens[keep]
replace the gene name in the seurat object
reference <- reference[keep,] reference@assays$RNA@data@Dimnames[[1]]=ens names(reference@assays$RNA@data@Dimnames[[1]])=c()