Question

Fast querying of variant metrics with database

0

Entering edit mode

6 weeks ago

DBScan ▴ 470

I'm trying to implement a shiny dashboard to display variant metrics, for instance a histogram of DP value across all samples of a single site. Since this should be fast, I was thinking of creating a database (sqlite3) for this task. I have the following approach in mind:

Divide the input VCF into multiple regions
Query each region with the python package cyvcf2
Store the metrics of interest into a pandas dataframe
Insert the dataframe into the SQL database (pandas.to_sql)

This works more or less (more less than more) since I want to run multiple threads at the same time. This in turn locks my database somehow and I lose several sites. Are there any other alternative approaches I could take?

python database • 197 views

ADD COMMENT • link updated 6 weeks ago by Pierre Lindenbaum 165k • written 6 weeks ago by DBScan ▴ 470

score 0 · Answer 1 · 2025-02-25

0

Entering edit mode

6 weeks ago

Pierre Lindenbaum 165k

Are there any other alternative approaches I could take?

binning data. Use a 'bin' column /index in your sql database. The 'Bin' Column Used By Sam, Ucsc...

http://genomewiki.ucsc.edu/index.php/Bin_indexing_system

ADD COMMENT • link 6 weeks ago by Pierre Lindenbaum 165k