Fast querying of variant metrics with database
1
0
Entering edit mode
6 weeks ago
DBScan ▴ 470

I'm trying to implement a shiny dashboard to display variant metrics, for instance a histogram of DP value across all samples of a single site. Since this should be fast, I was thinking of creating a database (sqlite3) for this task. I have the following approach in mind:

  1. Divide the input VCF into multiple regions
  2. Query each region with the python package cyvcf2
  3. Store the metrics of interest into a pandas dataframe
  4. Insert the dataframe into the SQL database (pandas.to_sql)

This works more or less (more less than more) since I want to run multiple threads at the same time. This in turn locks my database somehow and I lose several sites. Are there any other alternative approaches I could take?

python database • 197 views
ADD COMMENT
0
Entering edit mode
6 weeks ago

Are there any other alternative approaches I could take?

binning data. Use a 'bin' column /index in your sql database. The 'Bin' Column Used By Sam, Ucsc...

http://genomewiki.ucsc.edu/index.php/Bin_indexing_system

ADD COMMENT

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6