Good evening, I working on whole genome seq. data and I am struggling a bit with plink. After having merged all my VCF file into a plink .bed file, I need to exclude the SNPs which are not in HW equilibrium. By merging many whole genome seq VCF files my plink file is full of NA, so I need to encode my SNPs as 0/1/2 (to set the missing as homozygous for the reference allele) before verifying the HW condition. After that I need to recode my file as traw file (as an output from plink --recode A-transpose, which has the individuals in columns and a line for each SNP). I would need something like recode12 --fill-missing-a2, I guess, but recode12 output just a ped (or a tped) file, which is not what I need. Does anyone know a possible solution? In few words I need to encode my plink .bed file as 0/1/2 (2 stading for homozygosity for the minor allele, and 0 to fill all the NA), in order to check the HW eq. and then convert it to a matrix with an individual fro each column and a SNP for each line). Thanks
Did u try th recode A-transpose after the recode12 --fill-missing-a2?
Thanks stolarek.ir for your answer. Your solution seems interesting and I will definitely try it next time. However working on quite a big set of whole genome sequencing file I didn't have the possibility (in term of time and disk space) to genotype all the vcf against all the positions present in all the samples and we decided to move forward at this stage without filtering out SNPs out of HW equilibrium. But thanks anyway for your solution.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.