Hi,
I am humbly seeking your guidance regarding the use of PHG for haplotype generation and skim seq imputation. As part of our research efforts, we have been utilizing the reference range summary file to retrieve haplotype paths for reference lines. However, we have observed that the haplotype paths obtained through this method often contain a larger proportion of missing data. I would greatly appreciate if you could kindly advise us on a better approach to obtain reference lines' haplotype paths from PHG. Your expertise and guidance in this matter would be invaluable to our research endeavors. Thank you in advance for your time and assistance.
Respectfully,
Wen
Is this a PHGv1 database that you have created? What is the "reference range summary file" referred to in your question?
PHGv1 database have been created, imputation went well. Reference range summary file was generated from ReferenceRangeSummaryPlugin, in which has Haplotype_ID,RefRange_ID, Range,Taxa, information
The ReferenceRangeSummaryPlugin provides a summary of data for each reference range based on the haplotypes created from running assembly or WGS processing to create haplotypes. The output from this plugin is based on the methods given by the "methods" input parameter. Haploypes may be missing for a reference range if the method used to create that haploype was not included.
You will also see that non-reference taxa be missing a haplotype in some reference ranges because the aligner did not align sequence to that reference range.
This link also discusses an issue with low haplotype count based on parameters used in the BestHaplotypePathPlugin. Note it suggests setting the "usebf" parameter to "false". Expected number of haplotype IDs per path in PHG