Question

warning[vg::Watchdog]

0

Entering edit mode

2.1 years ago

hxt • 0

**I am trying to use giraffe to map paried-end short-reads against my pan-genoome graph,I met some warning(below).

Could you tell me how I can do ?**
warning[vg::Watchdog]: Thread 16 has been checked in for 10 seconds processing: ERR626576.4149, ERR626576.4149
warning[vg::Watchdog]: Thread 17 has been checked in for 10 seconds processing: ERR626576.7602, ERR626576.7602
warning[vg::Watchdog]: Thread 18 has been checked in for 10 seconds processing: ERR626576.6088, ERR626576.6088
warning[vg::Watchdog]: Thread 20 has been checked in for 10 seconds processing: ERR626576.12421, ERR626576.12421
warning[vg::Watchdog]: Thread 8 has been checked in for 10 seconds processing: ERR626576.29760, ERR626576.29760
warning[vg::Watchdog]: Thread 13 has been checked in for 10 seconds processing: ERR626576.27768, ERR626576.27768
warning[vg::Watchdog]: Thread 0 has been checked in for 10 seconds processing: ERR626576.332252, ERR626576.332252
warning[vg::Watchdog]: Thread 4 has been checked in for 10 seconds processing: ERR626576.59052, ERR626576.59052
warning[vg::Watchdog]: Thread 21 has been checked in for 10 seconds processing: ERR626576.70060, ERR626576.70060
warning[vg::Watchdog]: Thread 1 has been checked in for 10 seconds processing: ERR626576.75464, ERR626576.75464
warning[vg::Watchdog]: Thread 3 has been checked in for 10 seconds processing: ERR626576.86319, ERR626576.86319
warning[vg::Watchdog]: Thread 19 has been checked in for 10 seconds processing: ERR626576.85706, ERR626576.85706
warning[vg::Watchdog]: Thread 22 has been checked in for 10 seconds processing: ERR626576.108047, ERR626576.108047
warning[vg::Watchdog]: Thread 13 finally checked out after 21 seconds and 56384 kb memory growth processing: ERR626576.27768, ERR626576.27768
warning[vg::Watchdog]: Thread 14 has been checked in for 10 seconds processing: ERR626576.126421, ERR626576.126421

vg Watchdog giraffe • 2.1k views

ADD COMMENT • link updated 2.1 years ago by colindaven 7.7k • written 2.1 years ago by hxt • 0

0

Entering edit mode

This is just a warning, not an error. Did you get an error in the end ?

ADD REPLY • link 2.1 years ago by colindaven 7.7k

0

Entering edit mode

Thank you for your replay!

No error was eventually reported because the command didn't end up running through.

I run this conmand vg giraffe -t 24 -Z 63-pg.d2.gbz -d 63-pg.d2.dist -m 63-pg.d2.min -f IRIS_313-7684_1.QC.fastq.gz -f IRIS_313-7684_2.QC.fastq.gz > IRIS_313-7684_mapped.gam

When I first started running this command, the size of the gam file kept increasing, but after the warning message appeared , the size of the gam file stayed the same, which lasted all night, so I terminated the command.I don't think vg giraffe should last very long.

ADD REPLY • link 2.1 years ago by hxt • 0

0

Entering edit mode

I would expect vg giraffe to be slow. For me, it's almost impossible to use. The only efficient, so fast, short read aligner for pangenomes is minigraph in my experience, and that sadly only works on rGFA and not GFA pangenomes to my knowledge.

Speed depends on the following

your machine CPU speed and RAM
size of your input data (try to get it to work with 1m reads before using the whole dataset) eg using head -n 4000000 R1.fastq > small_R1.fastq
complexity and size of your pangenome

ADD REPLY • link 2.1 years ago by colindaven 7.7k

0

Entering edit mode

Your reply is very useful! I think the problem is most likely the third reason above

complexity and size of your pangenome

My pan-genome is built using minigraph-cactus pipeline.I have two pan-genomes, one built with 63 genomes and the other with 5.When I run the vg giraffe command using the pangenome built with 5 genomes, the command runs successfully in a relatively short time without any warning messages. But when using the pan-genome built with 63 genomes, the above warning message appears.I want to run the vg giraffe command using the pangenome built with 63 genomes, what should I do

Thanks!

ADD REPLY • link 2.1 years ago by hxt • 0

1

Entering edit mode

Often the hard part is graph construction. It's easy to build a graph that represents some alignment between the sequences, but what you really want is a graph (an alignment) that is useful in the intended application.

For Giraffe, that means avoiding complex structures at every level of detail while simultaneously trying to minimize sequence duplication. The Minigraph–Cactus pipeline achieves that, at least if the sequences resemble human genomes and there are not too many of them. For a new species, graph construction can be a major project until you find the right way to do it.

Minigraph–Cactus produces several versions of the graph. With a few haplotypes (like 5), you want to use the default (clip) graph for mapping. With more haplotypes (like 63), it's better to use the filter graph that drops every node visited by only one haplotype.

There is also an experimental option of using a sample-specific subgraph that consists of a few synthetic haplotypes. You start from the clip graph, count kmers in the reads, and generate a few synthetic haplotypes that should be similar to the sequenced sample. With human genomes, this approach is faster and produces better variant calling results than the filter graph. Nobody has tried it with other species yet, and we are not confident that the current approach handles structural variants properly.

ADD REPLY • link 2.1 years ago by Jouni Sirén ▴ 710

0

Entering edit mode

I don't think it's going to work to be honest with a large complex pangenome. Like I say, I have had big problems with performance trying to map short reads to plant pangenomes.

minigraph works - short or long reads (but rGFA format hard to work with downstream)
vg giraffe does not work well with any PGGB pangenome I have created (even small genomes like A. thaliana)
with minigraph-cactus I'm now (since cactus 2.6.1 with odgi output) seeing better and fast results when mapping with vg giraffe
GraphAligner seems good and robust for mapping long reads rapidly and robustly
my test pangenomes are all 3-5 plant genomes

I don't think short read mapping is a solved problem yet for pangenomics, others are looking at using non-mapping tools like pangenie.

ADD REPLY • link 2.1 years ago by colindaven 7.7k