warning[vg::giraffe]: Cannot cluster reads with a fragment distance smaller than read distance
0
1
Entering edit mode
10 months ago
cassiwatt ▴ 10

Hello,

While using vg giraffe to map reads I simulated using vg sim, I keep hitting these warning messages:

warning[vg::giraffe]: Finalizing fragment length distribution before reaching maximum sample size
                      mapped 98 reads single ended with 2 pairs of reads left unmapped
                      mean: 0, stdev: 1
warning[vg::giraffe]: Cannot cluster reads with a fragment distance smaller than read distance
                      Fragment length distribution: mean=0, stdev=1
                      Fragment distance limit: 2, read distance limit: 200
warning[vg::giraffe]: Falling back on single-end mapping

I infer that vg giraffe has some sort of problem mapping paired-end reads that overlap one another (insert size < total read length)?

Code to reproduce the warnings:

# Construct graph using minigraph-cactus example
# https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md#yeast-graph
cactus-pangenome ./js ./examples/yeastPangenome.txt --reference S288C --outDir yeast-pg --outName yeast-pg --vcf --giraffe

# Simulate reads from the graph
vg sim -x yeast-pg.d2.gbz -n 100 -l 150 -p 570 -v 165 -s 3 --multi-position -r -a > simreads.gam

# Extract fastq of simulated reads
vg view -a simreads.gam -X -i > simreads.fastq

# Remap the reads to the graph
vg giraffe -t 30 -Z yeast-pg.d2.gbz -m yeast-pg.d2.min -d yeast-pg.d2.dist -i -f simreads.fastq > mapped.gam

Warnings specific to this example:

warning[vg::giraffe]: Finalizing fragment length distribution before reaching maximum sample size
                      mapped 93 reads single ended with 7 pairs of reads left unmapped
                      mean: 0, stdev: 1
warning[vg::giraffe]: Cannot cluster reads with a fragment distance smaller than read distance
                      Fragment length distribution: mean=0, stdev=1
                      Fragment distance limit: 2, read distance limit: 200
warning[vg::giraffe]: Falling back on single-end mapping

Despite the warning that most of the reads were mapped single ended, vg stats -a mapped.gam shows that most were properly-paired:

Total alignments: 200
Total primary: 200
Total secondary: 0
Total aligned: 200
Total perfect: 170
Total gapless (softclips allowed): 200
Total paired: 200
Total properly paired: 186
Alignment score: mean 159.05, median 160, stdev 2.80134, max 160 (170 reads)
Mapping quality: mean 56.15, median 60, stdev 14.6054, max 60 (187 reads)
Insertions: 0 bp in 0 read events
Deletions: 0 bp in 0 read events
Substitutions: 38 bp in 38 read events
Softclips: 0 bp in 0 read events
Total time: 0.0225248 seconds
Speed: 8879.12 reads/second

Strangely, I’ve never run into these warnings when mapping real reads to a graph, only when using vg sim. Any insight into what’s going on here? Thank you for your support and for making such an incredible set of tools!

giraffe sim vg • 348 views
ADD COMMENT

Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6