Hi there!
I had a quick question about the output for properly paired in vg stats. I ran the vg stats command on 3 files after aligning them to the CHM13-T2T reference using vg giraffe. Each file outputted a value of 2000 for the properly paired section, and an error of: "Cannot cluster reads with a fragment distance smaller than read distance. Falling back on single-end mapping" appeared.
Then, I removed duplicates from the files using samtools markdup, and the properly paired values greatly improved.
I was wondering if 2000 is a default value for properly paired? Is there a reason this value kept coming up for all three files? Also, is there a command line in vg to remove duplicates from a file or let the user know if there are duplicates?
Thank you for taking the time to read this and help!