Question

Timeout error using Biomart to get gene lengths

1

Entering edit mode

2.3 years ago

AlexStar ▴ 170

I have counts data (processed already) and I want to get the lengths of the genes from Biomart, in order to normalize the data to TPM.

I've done this already many times in the past, and now I have new data, with 50K genes.

This is the code, and it worked fine before:

ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
genelength =  getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id', 'transcript_length','cds_length'), filters =  'ensembl_gene_id', values = rownames(counts), mart = ensembl, useCache = FALSE)
gene_canonical_transcript =  getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','transcript_is_canonical'), filters =  'ensembl_gene_id', values = rownames(counts), mart = ensembl, useCache = FALSE)
gene_canonical_transcript_subset = gene_canonical_transcript[!is.na(gene_canonical_transcript$transcript_is_canonical),]
genelength = merge(gene_canonical_transcript_subset, genelength, by = c("ensembl_gene_id", "ensembl_transcript_id"))
return(genelength)

Now I'm getting this error:

Error in curl::curl_fetch_memory(url, handle = handle) :
timeout was reached: [uswest.ensembl.org:443] connection timed out after 10000 milliseconds

What is the problem? I know the the server doesn't always work because it's used all the time, but this is a new kind of error I've never had something like this before.

And are there any alternatives? Another package maybe?

Thank you

r biomart • 1.4k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 2.3 years ago by AlexStar ▴ 170

1

Entering edit mode

with 50K genes.

Public resources always have limits on queries per unit time to keep the service available for all. You may be simply spamming the server with too many queries in a short time and are receiving a timeout. Consider adding a pause between sets of queries.

ADD REPLY • link 2.3 years ago by GenoMax 151k