I have a phased haplotype format vcf file that looks like this
##fileformat=VCFv4.0
##reference=human_b36_both.fasta
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12891 NA12892 NA12878 NA19239 NA19238 NA19240
22 47812545 rs5769818 A G . PASS GT 0|1 0|1 0|1 1|1 1|1 1|1
22 47812939 rs9616222 A G . PASS GT 0|1 0|1 0|1 1|0 1|1 1|1
22 47813002 rs5769819 A G . PASS GT 0|1 0|1 0|1 1|0 1|1 1|1
22 47813051 rs5769820 G A . PASS GT 1|0 1|0 1|0 1|0 1|1 1|1
22 47813163 rs5769821 A G . PASS GT 0|1 0|1 0|1 1|0 1|1 1|1
I do not have additional files or ped files to this.
I would like to calculate the identity by state (IBS) between all pairs of individuals - is there a way to convert this file into plink format or are there any tools that can take in vcf to calculate IBS ?
Thank you
thanks Kevin, I try to do that but end up wit a file that has 0 for each individual.
Any log or error messages?
No, so this is what I did (playing around with a chunk of chr22)
The result from above is an empty file
Then I tried to convert bim bam bed to map and ped
The ped file is just 0
If you avoid using
--make-bed
and instead produce a plain text PLINK dataset, can you then see data in that?still the same. I don't think PLINK can handle phased Haplotype file in the format 0|1, where 0 indicates ref and 1 is for the alt allele. Can it ?
It can handle phased, as I show in Step 5, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2 (the 1000 Genomes data is all phased). I will move my answer back to a comment. Perhaps the PLINK developer will pick it up later (in different time zone).
Ah, for now, I was able to sort my issue out using the R package SNPrelate. It takes in the vcf file as input, calculates IBS and then if one feels like it, it can also convert it to PLINK format !