adding new assemblies to previously built practical haplotype graph (v2)
1
0
Entering edit mode
8 months ago

Hi, we are testing phg_v2 by following the steps documented at build_and_load.

My first question is whether the workflow summarized in the main README:

phg create-ranges ...
phg align-assemblies ...
phg agc-compress ...
phg create-ref-vcf ...
phg create-maf-vcf ...
phg load-vcf ...

i) can be modified so that new assemblies are aligned after the first batch of VCF files was already loaded, or ii) whether you have to align all assemblies again and build a new PHG instead. Obviously I would prefer ii.

My second question is how can you find out if an assembly has a very poor alignment and you are better of leaving it out.

Thanks for your help, Bruno

pangenome phg_v2 plants PHG • 369 views
ADD COMMENT
1
Entering edit mode
8 months ago
lcj34 ▴ 420

Bruno: You can add new assemblies at any time. The vcfs are not dependent on one another. The only issue would be if you want to impute against existing haplotypes. Those imputations would need to be re-run if you wanted additional assembly haplotypes considered.

Please note there is a "pre-processing" step that comes before your list. That is the "annotate-fastas" step. The fastas output from this step are the fastas that should be input for the align-assemblies and agc-compress steps.

Regarding your question about metrics: phgv1 has a plugin called VCFMetrics plugin. This plugin takes either a single vcf file or a folder of vcf files and outputs a file of metrics for each vcf file, including number of SNPs, number of Indels, number of bases aligned, bases inserted, bases deleted, bases gapped, total bases and a list of indel sizes. We will be porting this metrics plugin (along with others) to phgv2 soon.

ADD COMMENT

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6