I was trying to find interesting patterns in trio VCF and was wondering if there are any tools that can:
- input: phased trio VCF
- output: genome regions where the child matches the parent haplotypes
I am new to this area of bioinfo, and so I was trying to find the keywords to search for, and I found "local ancestry" and "identity by descent" and maybe "chromosome painting" to be possible keywords
local ancestry seems to identify segments that are similar to a 'population', while identity by descent for closer relatedness e.g. families and trios. chromosome painting is often related to the local ancestry concept also.
One program i found was hapIBD, which I tried running on a trio VCF from 1000 genomes but the segments that it outputs are relatively short, and my feeling is that, for trio VCF, that the genome should be much more completely covered by segments
https://github.com/browning-lab/hap-ibd
here is how i invoked hap-ibd
for i in {1..22}; do
java -jar hap-ibd.jar gt=HG02024_VN049_KHVTrio.chr$i.vcf.gz map=maps/plink.chr$i.GRCh38.map out=out$i;
done;
fig 1. screenshot showing the output of hapIBD in jbrowse 2. there are very few and short "segments" (orange blocks), where my expectation would be to more completely cover the genome
fig 2. showing a zoom in on a block. it does seem to have some valid output, but it also only matches one haplotype there, while i'd want to also get the other haplotype
possibly similar thread Locating IBD candidates with just VCF files
I tried out the KING tool with the "IBD segments" function, which seemed like a good lead, but it did not output anything
this did not output any "segments" just a couple files
ex2.kin0 ex2X.kin ex2X.kin0
If you do not phase your input VCF to hap-ibd, I don't know what output you get, in fact I would expect the program to crash. Is there not an option to hap-ibd that tells it you're giving it a parent parent child trio?
thanks for the feedback, I could not figure out how to create the .fam files or something like that to provide the proper 'family structure' to "king" or "plink", which i think do allow providing these params. hap-ibd does not appear have a parameter for 'family structure'. https://github.com/browning-lab/hap-ibd
note that the VCF is phased though (has e.g. 0|1 type calls), i am testing out using data from https://hgdownload.soe.ucsc.edu/gbdb/hg38/1000Genomes/trio/HG02024_VN049_KHV/
subtext: I am a tool developer trying to find a way of visualizing these large haplotyp blocks in JBrowse 2 and just wondering what tools, if any, are used in the real world for this. I recently made JBrowse able to split the different phase into multiple rows, as seen in the screenshot.
I see. I have used hap-ibd in the past with success, from a trio you should get plenty of segments longer than 1 Mbp.
I would expect that people mostly use PLINK for these matters, however. I have not used it myself for this, yet, so can't help you with what options are correct. There should be tutorials all over the interwebs though.
If you have a tutorial that might help, please link it as an answer. even better if it works on the specific dataset that I linked. if I find an answer, I will update this thread but for now I am still looking.