Entering edit mode
3.1 years ago
Zahra
▴
110
Hi all,
As mentioned earlier in this post, I tried to convert the Ensembl gene ids to the Gene symbol. I didn't receive any error by the code below but the nrow of ens_to_symbol_biomart is 55605 and the length of ens is 55602, and I do not understand the reason for this difference. Would you mind helping me, please?
Here is my code:
query <- GDCquery(project = "TCGA-COAD", data.category = "Transcriptome Profiling" ,
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts" ,
experimental.strategy = "RNA-Seq")
GDCdownload(query)
query.counts.colon <- GDCprepare(query)
Colon.Matrix <- as.data.frame(SummarizedExperiment::assay(query.counts.colon ))
ens <- Colon.Matrix$ENS.ID
head(ens)
[1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457"
[5] "ENSG00000000460" "ENSG00000000938"
require (org.Hs.eg.db)
ens_to_symbol <- mapIds(
org.Hs.eg.db,
keys = ens,
column = 'SYMBOL',
keytype = 'ENSEMBL')
head(ens_to_symbol)
ENSG00000000003 ENSG00000000005 ENSG00000000419 ENSG00000000457 ENSG00000000460
"TSPAN6" "TNMD" "DPM1" "SCYL3" "C1orf112"
ENSG00000000938
"FGR"
mart <- useDataset('hsapiens_gene_ensembl', useMart('ensembl'))
ens_to_symbol_biomart <- getBM(
filters = 'ensembl_gene_id',
attributes = c('ensembl_gene_id', 'hgnc_symbol'),
values = ens,
mart = mart)
ens_to_symbol_biomart <- merge(
x = as.data.frame(ens),
y = ens_to_symbol_biomart ,
by.y = 'ensembl_gene_id',
all.x = TRUE,
by.x = 'ens')
head(ens_to_symbol_biomart)
ens hgnc_symbol
1 ENSG00000000003 TSPAN6
2 ENSG00000000005 TNMD
3 ENSG00000000419 DPM1
4 ENSG00000000457 SCYL3
5 ENSG00000000460 C1orf112
6 ENSG00000000938 FGR
Dear Hamid, Thanks a lot