Question

No output when converting transcript IDs to gene symbols

0

Entering edit mode

7 months ago

ashkan ▴ 160

I am trying to convert transcript ID (which is one column in my csv file) to gene symbol using Biomart (for all rows and do not mind to have each gene symbol multiple times) using the following few lines:

library(biomaRt)

ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")

data <- read.csv2('abundance.tsv', sep='\t')

transcript_ids <- as.character(data$target_id)

result <- getBM(
  attributes = c("ensembl_transcript_id", "ensembl_gene_id","hgnc_symbol"),
  filters = "ensembl_transcript_id",
  values = transcript_ids,
  mart = ensembl
)

the last part with getBM returned:

[1] ensembl_transcript_id ensembl_gene_id       hgnc_symbol          
<0 rows> (or 0-length row.names)

to check what the problem is I started with:

transcript_ids <- as.character(data$target_id)
print(transcript_ids)

the results are transcript IDs. then checked the next step to ensure that the connection to BioMart is successful. to test it I did the following :

test_result <- getBM(
  attributes = c("ensembl_transcript_id", "ensembl_gene_id", "hgnc_symbol"),
  filters = "ensembl_transcript_id",
  values = "ENST00000489730",
  mart = ensembl
)

and here is the results:

  ensembl_transcript_id ensembl_gene_id hgnc_symbol
1       ENST00000489730 ENSG00000069812        HES2

so it was fine. then I checked retrieving and merging the data using the following:

result <- getBM(
  attributes = c("ensembl_transcript_id", "ensembl_gene_id", "hgnc_symbol"),
  filters = "ensembl_transcript_id",
  values = transcript_ids,
  mart = ensembl
)

and I got this:

[1] ensembl_transcript_id ensembl_gene_id       hgnc_symbol          
<0 rows> (or 0-length row.names)

Do you know how to fix the problem?

RNA-seq • 272 views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 7 months ago by ashkan ▴ 160

0

Entering edit mode

Honestly, why the overhead with biomart? Get a GTF file that matches your annotations, and just do a left join with your transcript IDs, it's quite trivial actually. If you post a few example lines of transcripts and which annotations were used I can suggest code if you want.. Do you even need transcript-level annotations? Or do you want gene level counts? Consider using the tximport package if so.

ADD REPLY • link 7 months ago by ATpoint 86k