Question

Construct a pantranscriptome reference with two haplotypes from a single sample.

0

Entering edit mode

18 months ago

Juhyun • 0

Hello,

I am trying to construct a graph reference for rpvg using two haplotypes from a single sample.

I created a GFA file from two haplotype from single sample with pggb. Then, I generated VCF file and graph.pg from the gfa file using vg deconstruct command line. I attempted to run vg rna or vg autoindex -w mpmap with this vcf and graph.pg, but I encountered the following error.

[IndexRegistry]: Checking for phasing in VCF(s).
error:[vg autoindex] Input is not sufficient to create indexes
Inputs
    GTF/GFF
    Reference FASTA
    VCF
are insufficient to create target index Haplotype-Transcript GBWT

I believe the error may be caused by the absence of heterogenous variants in the VCF file, making it impossible to phase the VCF file.

In this situation, is there any way to construct rna pantranscriptome reference using only two haplotype? or with only GFA file

Thank you

vg • 1.2k views

ADD COMMENT • link updated 18 months ago by Jordan M Eizenga ▴ 660 • written 18 months ago by Juhyun • 0

2

Entering edit mode

Are you running an up-to-date VG version? In the most recent version of VG, vg autoindex should be able make the vg mpmap indexes without also constructing a haplotype-transcript GBWT. However, if you need the haplotype transcripts (e.g. for use downstream in rpvg), then phasing in the VCF is strictly essential.

ADD REPLY • link 18 months ago by Jordan M Eizenga ▴ 660

0

Entering edit mode

Thank you for your prompt response!

The vg version I used is

version v1.40.0 "Suardi"

And I plan to use rpvg to identify haplotype-specific transcripts.

So, if I want to identify and quantify haplotype-specific transcripts from a sample, is it the best way to construct a pantranscriptome reference using multiple samples, including the sample_A I want to use, using vg autoindex -w mpmap? Following the mapping of reads from sample_A to that pantranscriptome with vg mpmap, can I subsequently run rpvg?"

ADD REPLY • link 18 months ago by Juhyun • 0

0

Entering edit mode

The best way would probably be to use a GFA directly rather than converting it to a VCF. I believe some of the features for creating spliced pangenome graphs with GFA input vg autoindex were added since 1.40.0, so you should update the version too.

Also, there are some pitfalls in using graphs from PGGB. First, VG expects haplotypes to be expressed as W lines in the GFA, but PGGB uses P lines (see the spec for details), so you will need to convert them. Second, the more complicated structure of PGGB graphs makes it challenging for VG to project gene annotations between the haplotypes. You can get around this limitation by using a liftover tool to get annotations against the haplotypes and providing them with -H. Finally, the mapping tools in VG tend not to perform as well on PGGB's more complicated graph structures.

Overall, my recommendation at this point would be to use Minigraph-Cactus if your goal is to use the pangenome for short read mapping. In the cases I know of where users tried to use vg mpmap or vg giraffe on PGGB graphs, they haven't been very successful.

ADD REPLY • link 18 months ago by Jordan M Eizenga ▴ 660