Entering edit mode
21 months ago
JACKY
▴
160
I have counts data (processed already) and I want to get the lengths of the genes from Biomart, in order to normalize the data to TPM.
I've done this already many times in the past, and now I have new data, with 50K genes.
This is the code, and it worked fine before:
ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
genelength = getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id', 'transcript_length','cds_length'), filters = 'ensembl_gene_id', values = rownames(counts), mart = ensembl, useCache = FALSE)
gene_canonical_transcript = getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','transcript_is_canonical'), filters = 'ensembl_gene_id', values = rownames(counts), mart = ensembl, useCache = FALSE)
gene_canonical_transcript_subset = gene_canonical_transcript[!is.na(gene_canonical_transcript$transcript_is_canonical),]
genelength = merge(gene_canonical_transcript_subset, genelength, by = c("ensembl_gene_id", "ensembl_transcript_id"))
return(genelength)
Now I'm getting this error:
Error in curl::curl_fetch_memory(url, handle = handle) :
timeout was reached: [uswest.ensembl.org:443] connection timed out after 10000 milliseconds
What is the problem? I know the the server doesn't always work because it's used all the time, but this is a new kind of error I've never had something like this before.
And are there any alternatives? Another package maybe?
Thank you
Public resources always have limits on queries per unit time to keep the service available for all. You may be simply spamming the server with too many queries in a short time and are receiving a timeout. Consider adding a pause between sets of queries.