Hello,
While using vg giraffe to map reads I simulated using vg sim, I keep hitting these warning messages:
warning[vg::giraffe]: Finalizing fragment length distribution before reaching maximum sample size
mapped 98 reads single ended with 2 pairs of reads left unmapped
mean: 0, stdev: 1
warning[vg::giraffe]: Cannot cluster reads with a fragment distance smaller than read distance
Fragment length distribution: mean=0, stdev=1
Fragment distance limit: 2, read distance limit: 200
warning[vg::giraffe]: Falling back on single-end mapping
I infer that vg giraffe has some sort of problem mapping paired-end reads that overlap one another (insert size < total read length)?
Code to reproduce the warnings:
# Construct graph using minigraph-cactus example
# https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md#yeast-graph
cactus-pangenome ./js ./examples/yeastPangenome.txt --reference S288C --outDir yeast-pg --outName yeast-pg --vcf --giraffe
# Simulate reads from the graph
vg sim -x yeast-pg.d2.gbz -n 100 -l 150 -p 570 -v 165 -s 3 --multi-position -r -a > simreads.gam
# Extract fastq of simulated reads
vg view -a simreads.gam -X -i > simreads.fastq
# Remap the reads to the graph
vg giraffe -t 30 -Z yeast-pg.d2.gbz -m yeast-pg.d2.min -d yeast-pg.d2.dist -i -f simreads.fastq > mapped.gam
Warnings specific to this example:
warning[vg::giraffe]: Finalizing fragment length distribution before reaching maximum sample size
mapped 93 reads single ended with 7 pairs of reads left unmapped
mean: 0, stdev: 1
warning[vg::giraffe]: Cannot cluster reads with a fragment distance smaller than read distance
Fragment length distribution: mean=0, stdev=1
Fragment distance limit: 2, read distance limit: 200
warning[vg::giraffe]: Falling back on single-end mapping
Despite the warning that most of the reads were mapped single ended, vg stats -a mapped.gam
shows that most were properly-paired:
Total alignments: 200
Total primary: 200
Total secondary: 0
Total aligned: 200
Total perfect: 170
Total gapless (softclips allowed): 200
Total paired: 200
Total properly paired: 186
Alignment score: mean 159.05, median 160, stdev 2.80134, max 160 (170 reads)
Mapping quality: mean 56.15, median 60, stdev 14.6054, max 60 (187 reads)
Insertions: 0 bp in 0 read events
Deletions: 0 bp in 0 read events
Substitutions: 38 bp in 38 read events
Softclips: 0 bp in 0 read events
Total time: 0.0225248 seconds
Speed: 8879.12 reads/second
Strangely, I’ve never run into these warnings when mapping real reads to a graph, only when using vg sim. Any insight into what’s going on here? Thank you for your support and for making such an incredible set of tools!