Hello,
Thank you for your continuous help so far!
I was wondering if there is there is a benefit to running the pathfinding through the PHG for all samples I have together or if it does not matter much?
As an example, if I have 15 sets of WGS data (fastq) of 15 different lines and want to run the -ImputePipelinePlugin -imputeTarget path
step (or even pathToVCF
). I would like to use minimap2 to do the alignments against the pangenome (as it is implemented in the default pipline), so inputType
would be fastq. Is there anything in the pathfinding step itself that speaks against doing this in e.g. three separate "jobs" with 5 alignments and paths to do each? I.e. have 3 different keyfiles listing the fastq file locations correspondingly?
Or does the imputation of one path help with that of the next one?
(Running PHG v1.2 if it matters)
Thank you again and all the best!
Hello,
thank you again for your help!
This is what I wanted to know, if the paths are independent. Thank you for clarifying.
I did try it, but wanted to make sure it does not affect the results in the end. What I noticed is that I can not run it as multiple processes at once, since it seems like only one process at a time can access the database (which makes sense I suppose). Though even running it consequtively should help me.
(For information, I am working on a cluster with each node accessing the same data storage. So starting the process on multiple nodes would help me use the resources better.)
If you are running with an SQLite DB I believe it does lock you into one connection at a time. The PathFinding portion of the PHG after you have the alignments is actually fairly fast and multithreaded so it should not take a ton of time even if using a single machine.
The alignment step of the imputation does take some time and we have been working on new strategies to be able to deploy this better on cluster systems which allows you to use multiple nodes, but this is still in the testing phase. We will likely add it to a future release of the PHG.
Thank you again for the answer!
Would multiple connections be possible using a "postgres" database?