I'm trying to implement a shiny dashboard to display variant metrics, for instance a histogram of DP value across all samples of a single site. Since this should be fast, I was thinking of creating a database (sqlite3) for this task. I have the following approach in mind:
- Divide the input VCF into multiple regions
- Query each region with the python package
cyvcf2
- Store the metrics of interest into a pandas dataframe
- Insert the dataframe into the SQL database (
pandas.to_sql
)
This works more or less (more less than more) since I want to run multiple threads at the same time. This in turn locks my database somehow and I lose several sites. Are there any other alternative approaches I could take?