Dear all,
I have a vcf file containing SNPs data of multiple individuals where most are haploid while some are diploid. What I want to do is to "haploidize" the diploid individual, meaning that I want to to randomly take one allel out for the two for all loci of those individuals.
What kind of tools can I use to do that ? Or what is the scripting manner to adopt ?
For example,
scaffold1 25042 . G A 13300.6 PASS AC=5;AF=0.179;AN=28;BaseQRankSum=-5.920e-01;ClippingRankSum=0.373;DP=1268;ExcessHet=0.7918;FS=5.925;MQ=57.37;MQRankSum=-1.031e+00;QD=32.39;ReadPosRankSum=0.943;SOR=1.255 GT:AD:DP:GQ:PL 0/0:41,0:41:90:0,90,1528 0/1:13,16:29:99:616,0,498 0:56,0:56:99:0,1800 0:66,0:66:99:0,1800 0:82,0:82:99:0,1800 0/0:19,0:19:33:0,33,495
I have the 1st, 2nd and 6th individuals are diploid while the others are haploid...what I want to do is to randomly take one of the two alleles for the diploid individuals... The resulting file could be another vcf file or under the form of tab separated file.
Any suggestion ?
Thank you very much in advance.
if you were to work with the same problem but on the plink ped files, here's my approach after recoding alleles as 0/1/2: