Dear all,
has anyone found a good way of speeding up vg call
by chunking or another method ?
Current chunking seems to lead to a very modest speedup of 0-20% so maybe is not the right approach.
Specifically, I have aligned with vg giraffe
and used the following code to chunk the resulting GAM
file into sets of 100000 reads.
Commands from a nextflow script
vg chunk -t $task.cpus --gam-split-size $params.gam_split_size -a $gam
vg pack -t $task.cpus -x $xg -g $chunked_gam -Q5 -o ${prefix}.aln.pack
vg call -t $task.cpus $xg -k ${prefix}.aln.pack --min-support $params.min_support -s $sample_name > ${prefix}.vcf
I've read lots of docs but am not sure what's up to date
- There is a complicated vg_toil script here, however I don't know if this is up to date (from 2020) so I'm a bit wary
Thanks
vg call
took me 3 days and failed ultimately due to time-limit (the job has duration as I submitted the job by slurm). Is this considered to be a normal occurrence?The command I utilized is :
The log was silent and the output vcf was empty.
The the sizes of input files are:
When running
vg call
on.gbz
input, you can often see a major speedup by adding-z
to limit it to haplotypes present in the.gbz
. For example, this makes it 100s of times faster for the HPRC graphs.Thanks, I'll try it. But I also have a vg call process that takes xg, xg.pack, xg.snarls as input. It also run 3 days for nothing happened in its vcf. it seemed that no matter how much time elapsed, it would never finish. I even began to question whether the process was stuck. Is it a normal situation? Is there any method to determine if a process is stuck or not?
update on Aug 2nd:
Currently, I have two instances of the vg call command running, and at intervals of 12 hours, I have been monitoring the memory usage, which has remained virtually unchanged. This further intensifies my concerns that the processes may be stuck. I eagerly await your assistance. Thanks.
Interesting idea. So my nextflow code is like this at the moment, can I just change the $xg and $gbwt to $gbz, add -z, and get the speedup like shown here ?
Edit - looks like it worked, time reduced on a test 25k arabidopsis example from 4m14 to 2m38. Thanks!
-z
does not take an argument.vg call graph.xg -g graph.gbwt
should be exactly equivalent tovg call graph.xg -g graph.gbwt
. (ie-g
should give the same speedup as-z
). If you are seeing different runtimes, I suggest double-checking your output.to Glenn:
May I inquire if you could provide me with information regarding the species that the VG team has attempted while executing the
vg call
command, along with the corresponding time and resources expended? This would give me some insight into how I should plan my workflow.Furthermore, I am contemplating the idea of constructing graphs and calling variants on a per-chromosome basis. Is this theoretically feasible?
Thanks