What should I do if the gene IDs in my reference dataset and test dataset are different
1
0
Entering edit mode
22 months ago
paria ▴ 90

Hi everyone,

I am trying to annotate my single cell dataset (integrated dataset) using singleR. However, I am getting the error "no common genes between 'test' and 'ref'". I realized that the gene IDs are different in two datasets (gene symbol in test dataset and gene ensemble ID in ref). I tried to convert gene symbol to ensemble ID in my test dataset using Biomart. I did extracted the RNA slot as input for getBM which is data frame. But I am not sure how to integrate this matrix including ensemble gene ID to the original Seurat object. I cannot move forward with the new matrix I have because the input for singleR is sce object. Could anyone tell me what should I do in this situation? Below is the code I used to convert gene ID

ALS <- readRDS("seurat_clustered.rds")
DefualtAssay <- "RNA"
test_assay <- GetAssayData(ALS)
library("biomaRt")
ensembl <- useEnsembl(biomart = "genes")
ensembl =useDataset("hsapiens_gene_ensembl",mart=ensembl)
Gene <- rownames(test_assay)
lable <- getBM(attributes=c('ensembl_gene_id'), filters ='external_gene_name', values =Gene, mart =ensembl)
str(label)
'data.frame'

I have a data frame that I cannot convert to single cell experiment object which is the same format as ref dataset. I greatly appreciate any comment.

Paria

gene-ID RNA-seq R singleR • 2.4k views
ADD COMMENT
3
Entering edit mode
22 months ago

I would make the single cell experiment first then change from symbols to ensembl and remove any symbols that did not match. I don't really understand " I cannot convert to single cell experiment object which is the same format as ref dataset" comment. Something like this:

seurat_clustered.sce <- as.SingleCellExperiment(seurat_clustered)
require(EnsDb.Hsapiens.v86)
ens <- mapIds(EnsDb.Hsapiens.v86,
keys = rownames(seurat_clustered.sce),
column = 'GENEID',
keytype = 'SYMBOL')
all(rownames(seurat_clustered.sce) == names(ens))

keep <- !is.na(ens)
ens <- ens[keep]
seurat_clustered.sce <- seurat_clustered.sce[keep,]
rownames(seurat_clustered.sce) <- ens
ADD COMMENT
0
Entering edit mode

Thanks for your help! Paria

ADD REPLY
0
Entering edit mode

Here is the code that worked for me

read in the ref dataset

reference <- readRDS("DR.rds")

convert ensemble id to gene symbol, I found the best dataset is EnsDb.Hsapiens.v79

ens <- mapIds(EnsDb.Hsapiens.v79, keys =reference@assays$RNA@data@Dimnames[[1]], column ='SYMBOL', keytype ='GENEID')

length(ens)==length(reference@assays$RNA@data@Dimnames[[1]]) #it shoud Return 'TRUE'

remove NA values from ens

keep <- !is.na(ens) ens <- ens[keep]

replace the gene name in the seurat object

reference <- reference[keep,] reference@assays$RNA@data@Dimnames[[1]]=ens names(reference@assays$RNA@data@Dimnames[[1]])=c()

ADD REPLY

Login before adding your answer.

Traffic: 2611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6