vg tools is running, but memory consumption is not happened and log files are not updated
2
0
Entering edit mode
2.9 years ago

Hi there, I use vg tools for giraffe mapping.

I started vg tools 2022.01.13 and now(2022.01.17).

vg makes log files : chunked fasta, chunked vcf file, etc...

I has checked log files. log files are not updated. So i check linux top command result to check vg consumption of CPU and memory.

vg consumes CPU but don't consume memory. i thought vg was running.

But run time is long~. So currently i guesses that vg run is fake.

How I do check vg run information?

My vg tools command is below.

vg autoindex --workflow giraffe --prefix ./hg38_based_variation_graph/hg38_pangenome -r ./fasta/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta -v ./vcf/test.vcf.gz -T ./hg38_based_variation_graph/ --threads 2 --target-mem 35G --verbosity 1
autoindex vgteam vg giraffe • 1.4k views
ADD COMMENT
0
Entering edit mode
2.9 years ago

There are steps in the indexing pipeline that do take a significant amount of time but not much memory, so this very well might be expected. It's hard for me to say anything for sure without seeing your log file though. Perhaps you could try again with --verbosity 2 and send the output?

ADD COMMENT
0
Entering edit mode

Thanks your comment! I tried command with --verbosity 2. I saw same message and log files were not updated. log files were same!

output message is below

[IndexRegistry]: Checking for phasing in VCF(s).

[IndexRegistry]: Provided: VCF w/ Phasing

[IndexRegistry]: Chunking inputs for parallelism.

[IndexRegistry]: Chunking FASTA(s).

[IndexRegistry]: Chunking VCF(s).

--> no change over 2 days

ADD REPLY
0
Entering edit mode

That is a step where we expect longer compute times with very little memory use though, which is consistent with what you are seeing. Are you using a very large VCF file? One thing that might help is to break it into separate VCFs by contig. Otherwise, there aren't a lot of good alternatives to just reading the entire file serially.

ADD REPLY
0
Entering edit mode
15 months ago
Maxine ▴ 50

I encountered the exact same issue. The reason may not be because the VCF file is too large; rather, it is because there are too many contigs. Here's why I think so:

I attempted to use two different versions of VCF files and the same reference genome to construct a giraffe graph separately using the following command:

vg autoindex --workflow giraffe -r $ref_genome -v $input_vcf -p bufo -t $SLURM_CPUS_PER_TASK -M ${target_mem}M

The difference between the two VCF files is that VCF1 only includes structural variants (SVs) located on 16 chromosomes, while VCF2 includes the same SVs as VCF1 plus additional SVs located on 700+ scaffolds. Although 700+ scaffolds may seem like a lot, there are only 66k SVs on those scaffolds compared to a total of 1.4 million SVs on 16 chromosomes.In other words, VCF1 would be divided into 16 parts, while VCF2 would be divided into 700+ pieces.

The autoindex with VCF1 completed in 3 hours but another autoindex with VCF2 is still stuck at "[IndexRegistry]: Chunking VCF(s)." after 4 hours. I suspect that there might be a limit on the number of chunks, which could be causing this freezing issue.

I expect better mapping results so I would prefer not to lose the information on scaffolds. Are there any possible solutions?

ADD COMMENT

Login before adding your answer.

Traffic: 2026 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6