Entering edit mode
21 months ago
gernophil
▴
90
Hey everyone,
I just wanted to execute a script that worked before. However, everytime I try to run it now RStudio gets unresponsive. I didn't change anything. Does anyone else experience this?
This is an extract from my script:
library(biomaRt)
...
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
data_table[, symbol := getBM(mart = mart,
attributes = "hgnc_symbol",
filters = "ensembl_gene_id",
values = `Ensembl Gene ID`)]
Best
Haven't worked that much with data.tables, but it seems your function sends a separate HTTP request for each gene symbol? If that is truly the case, I think you might be running into some sort of Denial of Service protection/rate limit from the API, which you are flooding with a few thousand requests?
Are you sure, it sends a request for every symbol individually? Shouldn't data.table do this only if you define a
function(x)
, if assigning a new column? I'll try to to it with the list of symbols and check, if it makes a difference.The the problem does not seem to be the
getBM()
function, but the assigning of the new column of the data.table:also crashes my RStudio. That's weird. I never had that before.
No, I am not sure, since I have no knowledge regarding the internal workings of data.table.
But this quick (and probably not authoritative) test indicates it might be the case:
The function length() always returns 1, which suggests that the column is broken up into separate invocations. To be sure, I also wrote a custom
testfunction()
which on purpose does not accept a vectorized parameter.If you run
you get an error. Since the same function runs like a charm in the data.table, I don't think that more than one value at any given time is provided to the function. It might be that data.table employs some clever logic to distinguish between functions that can operate on vectors and such that can't, but...I'd be surprised if that was the case.