Hello,
I am trying to construct a graph reference for rpvg using two haplotypes from a single sample.
I created a GFA file from two haplotype from single sample with pggb. Then, I generated VCF file and graph.pg from the gfa file using vg deconstruct command line. I attempted to run vg rna or vg autoindex -w mpmap with this vcf and graph.pg, but I encountered the following error.
[IndexRegistry]: Checking for phasing in VCF(s).
error:[vg autoindex] Input is not sufficient to create indexes
Inputs
GTF/GFF
Reference FASTA
VCF
are insufficient to create target index Haplotype-Transcript GBWT
I believe the error may be caused by the absence of heterogenous variants in the VCF file, making it impossible to phase the VCF file.
In this situation, is there any way to construct rna pantranscriptome reference using only two haplotype? or with only GFA file
Thank you
Are you running an up-to-date VG version? In the most recent version of VG,
vg autoindex
should be able make thevg mpmap
indexes without also constructing a haplotype-transcript GBWT. However, if you need the haplotype transcripts (e.g. for use downstream inrpvg
), then phasing in the VCF is strictly essential.Thank you for your prompt response!
The vg version I used is
And I plan to use
rpvg
to identify haplotype-specific transcripts.So, if I want to identify and quantify haplotype-specific transcripts from a sample, is it the best way to construct a pantranscriptome reference using multiple samples, including the sample_A I want to use, using
vg autoindex -w mpmap
? Following the mapping of reads from sample_A to that pantranscriptome withvg mpmap
, can I subsequently runrpvg
?"The best way would probably be to use a GFA directly rather than converting it to a VCF. I believe some of the features for creating spliced pangenome graphs with GFA input
vg autoindex
were added since 1.40.0, so you should update the version too.Also, there are some pitfalls in using graphs from PGGB. First, VG expects haplotypes to be expressed as W lines in the GFA, but PGGB uses P lines (see the spec for details), so you will need to convert them. Second, the more complicated structure of PGGB graphs makes it challenging for VG to project gene annotations between the haplotypes. You can get around this limitation by using a liftover tool to get annotations against the haplotypes and providing them with
-H
. Finally, the mapping tools in VG tend not to perform as well on PGGB's more complicated graph structures.Overall, my recommendation at this point would be to use Minigraph-Cactus if your goal is to use the pangenome for short read mapping. In the cases I know of where users tried to use
vg mpmap
orvg giraffe
on PGGB graphs, they haven't been very successful.