Find "phased segments" from trio VCF?
1
1
Entering edit mode
8 weeks ago
cmdcolin ★ 4.2k

I was trying to find interesting patterns in trio VCF and was wondering if there are any tools that can:

  • input: phased trio VCF
  • output: genome regions where the child matches the parent haplotypes

I am new to this area of bioinfo, and so I was trying to find the keywords to search for, and I found "local ancestry" and "identity by descent" and maybe "chromosome painting" to be possible keywords

local ancestry seems to identify segments that are similar to a 'population', while identity by descent for closer relatedness e.g. families and trios. chromosome painting is often related to the local ancestry concept also.

One program i found was hapIBD, which I tried running on a trio VCF from 1000 genomes but the segments that it outputs are relatively short, and my feeling is that, for trio VCF, that the genome should be much more completely covered by segments

https://github.com/browning-lab/hap-ibd

here is how i invoked hap-ibd

for i in {1..22}; do
    java -jar hap-ibd.jar gt=HG02024_VN049_KHVTrio.chr$i.vcf.gz map=maps/plink.chr$i.GRCh38.map out=out$i;
done;

enter image description here

fig 1. screenshot showing the output of hapIBD in jbrowse 2. there are very few and short "segments" (orange blocks), where my expectation would be to more completely cover the genome

enter image description here

fig 2. showing a zoom in on a block. it does seem to have some valid output, but it also only matches one haplotype there, while i'd want to also get the other haplotype

phased vcf genotyping variants phasing • 838 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I tried out the KING tool with the "IBD segments" function, which seemed like a good lead, but it did not output anything

 plink --vcf trio.vcf --make-bed --out ex
 king -b ex.bed --ibdseg --prefix ex2

this did not output any "segments" just a couple files

ex2.kin0 ex2X.kin ex2X.kin0

ADD REPLY
0
Entering edit mode

If you do not phase your input VCF to hap-ibd, I don't know what output you get, in fact I would expect the program to crash. Is there not an option to hap-ibd that tells it you're giving it a parent parent child trio?

ADD REPLY
0
Entering edit mode

thanks for the feedback, I could not figure out how to create the .fam files or something like that to provide the proper 'family structure' to "king" or "plink", which i think do allow providing these params. hap-ibd does not appear have a parameter for 'family structure'. https://github.com/browning-lab/hap-ibd

note that the VCF is phased though (has e.g. 0|1 type calls), i am testing out using data from https://hgdownload.soe.ucsc.edu/gbdb/hg38/1000Genomes/trio/HG02024_VN049_KHV/

subtext: I am a tool developer trying to find a way of visualizing these large haplotyp blocks in JBrowse 2 and just wondering what tools, if any, are used in the real world for this. I recently made JBrowse able to split the different phase into multiple rows, as seen in the screenshot.

ADD REPLY
0
Entering edit mode

I see. I have used hap-ibd in the past with success, from a trio you should get plenty of segments longer than 1 Mbp.

I would expect that people mostly use PLINK for these matters, however. I have not used it myself for this, yet, so can't help you with what options are correct. There should be tutorials all over the interwebs though.

ADD REPLY
0
Entering edit mode

If you have a tutorial that might help, please link it as an answer. even better if it works on the specific dataset that I linked. if I find an answer, I will update this thread but for now I am still looking.

ADD REPLY
0
Entering edit mode
29 days ago
cmdcolin ★ 4.2k

I made some progress with this using the hapIBD program

I was messing around with changing some of the settings somewhat randomly, but by lowering the size of allowable blocks, I was able to go from nearly empty output (my original post shows one block hapIBD outputted, but it was very sparse with default params) to output a lot of data.

the output is somewhat fragmented, but it could probably be cleaned up with some further investigations into params, or post-processing

my command

testing on chr1 of https://hgdownload.soe.ucsc.edu/gbdb/hg38/1000Genomes/trio/HG02024_VN049_KHV/ (and the map file provided in hap-ibd readme https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh38.map.zip)

java -jar hap-ibd.jar gt=HG02024_VN049_KHVTrio.chr1.vcf map=maps/plink.chr1.GRCh38.map out=myoutput min-output=0.1 min-seed=0.1

picture shows the correspondence between hap-ibd output and the raw VCF data with a trio VCF (6 rows, each row being a haplotype of the sample) which I marked up in google slides to help illustrate the connection between the hap-ibd output and jbrowse raw phased vcf display

running genome wide would probably be a bash loop

for i in {1..22}; do java -jar hap-ibd.jar gt=HG02024_VN049_KHVTrio.chr$i.vcf.gz map=maps/plink.chr$i.GRCh38.map out=out$i; done;

enter image description here

ADD COMMENT
0
Entering edit mode

here is a short script that convert the 'hap-ibd' output to a 'bed' file that can be loaded in the browser

#!/bin/bash
# create a bed file from hap-ibd output with columns "chr, start, end, sample_name1 hap1 sample_name2 hap2"
# add this config to jbrowse get(feature,'sample1')+':HP'+get(feature,'hap1')+' '+get(feature,'sample2')+':HP'+get(feature,'hap2')
zcat out1234.ibd.gz | cut -f 5,6,7 >! out.bed
zcat out1234.ibd.gz | cut -f 1,2,3,4 >! samples.txt
echo "#chr start end sample1 hap1 sample2 hap2" > out.bed
paste out.bed samples.txt >> out.bed
view raw gistfile1.txt hosted with ❤ by GitHub

ADD REPLY
0
Entering edit mode

I also realized that this screenshot is a great rendering of what crossing over/recombination looks like in practice. the mom and the dad both have crossing over in this region

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6