Hi all,
I’m working with a population VCF from HGDP that contains many variations across different populations and individuals. To build a pangenome for a single population, I subsetted the VCF to include only the individuals I’m interested in. However, I noticed that the subsetted VCF still contains variants that none of my selected individuals carry (i.e., all have ./. or 0/0 in the GT field), likely because those variants were present in populations that I excluded.
My question is: I used vg autoindex to build a pangenome from this subsetted VCF (with success). I want to ensure that these excluded variants do not appear in the resulting pangenome. In theory, they shouldn’t, but is there a way to verify this? Specifically, how can I check my .xg or .gcsa files to confirm that these variants do not create "bubbles" or paths in the pangenome?
Thanks for your help!
Cheers!
I see, I didn’t expect that.. thanks for the insight. Is there a way I can visualise this? For instance, if I have a mutation where all my subsetted individuals have "./." as the genotype chromosome 1, position 100, how could I "see" this? I’m working on an HPC environment, so I don’t have access to a visual interface to use vg view. Thanks a lot !!
If you extract a small region around a path position, you can use
vg view
to convert it into GFA format, which is text-based and reasonably readable for small graphs. For your example, it would look something like this: