Entering edit mode
21 months ago
wcs98
•
0
I have an error in my pathing step in the PHG (version 1.3) pipeline. I have 19 taxa each with ~71,000 reference ranges, anchorwave haplotypes from assemblies, and I have been able to map short-read samples to the indexed pangenome. However, when I try to run the -imputePipeline plugin to "path" I encounter an error during the BestHaplotypePathPlugin. Do I need to do use a smaller # of reference ranges/haplotypes to avoid this error?
Line that causes an error (it prints out all ~1.3 million hap ids):
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - CreateGraphUtils:addNodes - query=SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, asm_end_coordinate, asm_strand, genome_file_id, seq_hash, seq_len FROM haplotypes WHERE haplotypes_id in (71255, 71256, 71257,...
Error:
[pool-1-thread-1] DEBUG net.maizegenetics.pangenome.api.CreateGraphUtils - [SQLITE_TOOBIG] String or BLOB exceeds size limit (statement too long)
org.sqlite.SQLiteException: [SQLITE_TOOBIG] String or BLOB exceeds size limit (statement too long)
WIll you post the commands you have used for running these steps? When creating the graph that is used for the imputation steps, you have included a list of hapids. This is a very long list and it makes the query sent to the db too large to process.
When you give the method names for creating the graph this should pull the hapids needed and they should not need to be explicitly defined unless you want/need only a specific subset of haplotypes. If you will post your commands and indicate specific ways you need the graph filtered, we can help get you the correct commands.
Thanks for your posting. We've been able to reproduce this issue (From another instance where it was reported inhouse) and have identified a bug in the code that effects sqlite db queries when imputing with long hapids lists. We hope to have this fixed in the next PHG release. I'll let you know when it is available.