RSQLite slowness
1
0
Entering edit mode
3.2 years ago
jrl • 0

Dear biostar readers,

why is it so much slower to query an sqlite database using RSQlite «from the outside» using param like

statement <-
 "SELECT * FROM gene2refseq R
  LEFT JOIN gene_info I ON
    I.GeneID = R.GeneID
  WHERE R.`RNA_nucleotide_accession.version`
    LIKE ?"
db <- "gene_info.sqlite"
conn <- DBI::dbConnect(RSQLite::SQLite(), db)
x1 <- DBI::dbGetQuery(conn=conn,
  statement=statement,
  param=list(Håkan20210914$RNANucleotideAccession))

compared to querying «from the inside» of sqlite, by writing your search terms as a table first, and then calling it

statement <-
 "SELECT * FROM H
  LEFT JOIN gene2refseq R ON
    R.`RNA_nucleotide_accession.version`
    LIKE '%' || H.RNANucleotideAccession || '%'
  LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
DBI::dbWriteTable(conn, "H", Håkan20210914)
x2 <- DBI::dbGetQuery(conn=conn, statement=statement)
DBI::dbDisconnect(conn)

On my system (E5-2603 v4), the first query took more than an hour, while the second took only a few minutes ...

Do you guys know of any faster (but also nice) way to dig around in very large tsv files like https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz and https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz ?

This is double-posted from https://stat.ethz.ch/pipermail/r-help/2021-October/472465.html, but I thought some people might know more here compared to r-help ...

Best, Rasmus

tsv RSQLite • 943 views
ADD COMMENT
0
Entering edit mode
3.2 years ago

You've changed several things, the queries are not the same. In the first case you're reading from a file, which might be on a slow disk and might not have indices. Also that param=list() might be VERY long, which would get expanded into a massive SQL query. In the second case you are writing to conn, which might be in memory instead of on disk, and you have no such massive param=list().

ADD COMMENT

Login before adding your answer.

Traffic: 1677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6