Entering edit mode
13 months ago
shehzad_99
•
0
Hi, I used vg giraffe to align long read hifi sequences to a pangenome using vg giraffe and these are the numbers i am getting by running vg stats.
Total alignments: 4144653
Total primary: 4144653
Total secondary: 0
Total aligned: 4144644
Total perfect: 420948
Total gapless (softclips allowed): 3318916
Total paired: 0
Total properly paired: 0
Alignment score: mean 4237.06, median 3312, stdev 3333.28, max 25623 (1 reads)
Mapping quality: mean 58.526, median 60, stdev 8.54529, max 60 (4010647 reads)
Insertions: 1151521 bp in 768106 read events
Deletions: 1222803 bp in 772070 read events
Substitutions: 21231696 bp in 21231696 read events
Softclips: 43549949152 bp in 5046954 read events
Total time: 498646 seconds
Speed: 8.31181 reads/second
My team believes that these numbers are too good to be true and that there is something wrong. Could someone let me know if this is fine.
For reference the command i used is
vg giraffe -p -t 80 -Z /path/to/file.gbz -d /path/to/file.dist -m /path/to/file.min -x /path/to/file.xg -f sample.fq > vg-sample.gam 2> vg-sample.out
What numbers do you find "too good to be true"? What are you mapping and against what? What were you expecting?
I would also be quite surprised if you got especially good performance on HiFi from a mapping tool that is designed for short read data. One thing to note is that you got an average 10508 bp of soft clips per read, which is roughly half the length of a HiFi read.
There are experimental features in
vg giraffe
to support HiFi alignment, but they're not fully baked yet.