No output when converting transcript IDs to gene symbols
0
0
Entering edit mode
6 months ago
ashkan ▴ 160

I am trying to convert transcript ID (which is one column in my csv file) to gene symbol using Biomart (for all rows and do not mind to have each gene symbol multiple times) using the following few lines:

library(biomaRt)

ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")

data <- read.csv2('abundance.tsv', sep='\t')

transcript_ids <- as.character(data$target_id)

result <- getBM(
  attributes = c("ensembl_transcript_id", "ensembl_gene_id","hgnc_symbol"),
  filters = "ensembl_transcript_id",
  values = transcript_ids,
  mart = ensembl
)

the last part with getBM returned:

[1] ensembl_transcript_id ensembl_gene_id       hgnc_symbol          
<0 rows> (or 0-length row.names)

to check what the problem is I started with:

transcript_ids <- as.character(data$target_id)
print(transcript_ids)

the results are transcript IDs. then checked the next step to ensure that the connection to BioMart is successful. to test it I did the following :

test_result <- getBM(
  attributes = c("ensembl_transcript_id", "ensembl_gene_id", "hgnc_symbol"),
  filters = "ensembl_transcript_id",
  values = "ENST00000489730",
  mart = ensembl
)

and here is the results:

  ensembl_transcript_id ensembl_gene_id hgnc_symbol
1       ENST00000489730 ENSG00000069812        HES2

so it was fine. then I checked retrieving and merging the data using the following:

result <- getBM(
  attributes = c("ensembl_transcript_id", "ensembl_gene_id", "hgnc_symbol"),
  filters = "ensembl_transcript_id",
  values = transcript_ids,
  mart = ensembl
)

and I got this:

[1] ensembl_transcript_id ensembl_gene_id       hgnc_symbol          
<0 rows> (or 0-length row.names)

Do you know how to fix the problem?

RNA-seq • 258 views
ADD COMMENT
0
Entering edit mode

Honestly, why the overhead with biomart? Get a GTF file that matches your annotations, and just do a left join with your transcript IDs, it's quite trivial actually. If you post a few example lines of transcripts and which annotations were used I can suggest code if you want.. Do you even need transcript-level annotations? Or do you want gene level counts? Consider using the tximport package if so.

ADD REPLY

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6