Converting Ensembl gene id to Gene symbol
1
0
Entering edit mode
3.1 years ago
Zahra ▴ 110

Hi all,

As mentioned earlier in this post, I tried to convert the Ensembl gene ids to the Gene symbol. I didn't receive any error by the code below but the nrow of ens_to_symbol_biomart is 55605 and the length of ens is 55602, and I do not understand the reason for this difference. Would you mind helping me, please?

Here is my code:

 query <- GDCquery(project = "TCGA-COAD", data.category = "Transcriptome Profiling" ,
                      data.type = "Gene Expression Quantification",
                      workflow.type = "HTSeq - Counts" , 
                      experimental.strategy = "RNA-Seq")
    GDCdownload(query)
    query.counts.colon <- GDCprepare(query)
    Colon.Matrix <- as.data.frame(SummarizedExperiment::assay(query.counts.colon ))
    ens <- Colon.Matrix$ENS.ID

head(ens)

[1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457"
[5] "ENSG00000000460" "ENSG00000000938"



require (org.Hs.eg.db)
ens_to_symbol <- mapIds(
  org.Hs.eg.db,
  keys = ens,
  column = 'SYMBOL',
  keytype = 'ENSEMBL')
head(ens_to_symbol)

ENSG00000000003 ENSG00000000005 ENSG00000000419 ENSG00000000457 ENSG00000000460 
       "TSPAN6"          "TNMD"          "DPM1"         "SCYL3"      "C1orf112" 
ENSG00000000938 
          "FGR"


mart <- useDataset('hsapiens_gene_ensembl', useMart('ensembl'))
ens_to_symbol_biomart <- getBM(
  filters = 'ensembl_gene_id',
  attributes = c('ensembl_gene_id', 'hgnc_symbol'),
  values = ens,
  mart = mart)

ens_to_symbol_biomart <- merge(
  x = as.data.frame(ens),
  y =  ens_to_symbol_biomart ,
  by.y = 'ensembl_gene_id',
  all.x = TRUE,
  by.x = 'ens')
head(ens_to_symbol_biomart)

        ens hgnc_symbol
1 ENSG00000000003      TSPAN6
2 ENSG00000000005        TNMD
3 ENSG00000000419        DPM1
4 ENSG00000000457       SCYL3
5 ENSG00000000460    C1orf112
6 ENSG00000000938         FGR
Ensembl TCGAbiolinks biomaRT • 1.4k views
ADD COMMENT
3
Entering edit mode
3.1 years ago

For three ENSG ids you got more than one gene symbol, this is why the numbers between files are different:

I called the final object ens_to_symbol_biomart, as ens_to_symbol_biomart_merged:

table(duplicated(ens_to_symbol_biomart_merged$ens))
#FALSE  TRUE 
#56602     3 

#identify duplicated ens
ens_to_symbol_biomart_merged[duplicated(ens_to_symbol_biomart_merged$ens),]
                  ens         hgnc_symbol
#29396 ENSG00000230417         LINC00856
#42349 ENSG00000254876         SUGT1P4-STRA6LP
#53705 ENSG00000276085         CCL3L1

# Find all symbols for duplicated ens
dupID = ens_to_symbol_biomart_merged[duplicated(ens_to_symbol_biomart_merged$ens),]
ens_to_symbol_biomart_merged[ens_to_symbol_biomart_merged$ens %in% dupID$ens,]
                  ens        hgnc_symbol
#29395 ENSG00000230417       LINC00595
#29396 ENSG00000230417       LINC00856
#42348 ENSG00000254876        STRA6LP
#42349 ENSG00000254876        SUGT1P4-STRA6LP
#53704 ENSG00000276085        CCL3L3
#53705 ENSG00000276085        CCL3L1
ADD COMMENT
0
Entering edit mode

Dear Hamid, Thanks a lot

ADD REPLY

Login before adding your answer.

Traffic: 1717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6