Hi there,
I was trying to split a vcf into two sub-files and to convert both vcfs to plink binary format. But I found problem after converting.
Here is what I did.
I first split vcf files by using "awk 'BEGIN{while(getline<"part1.id")list[$0]=1}NR==1,/#CHROM/{if(x!="")print x;x=$0}/#CHROM/,EOF{printf $1;for(n=2;n<=9;n++){printf ("\t"$n)};for(m=10;m<=NF;m++){if(list[$m]||listA[m]){listA[m]=1;printf ("\t"$m)}}print ""}' all.vcf > part1.vcf".
The part1.id is an id sublist of all ids in the vcf file. Same for part2.id and part2.vcf
For the splitted vcfs (part1.vcf and part2.vcf), vcftools version 0.1.13 was used to convert them into plain text plink format like "vcftools --vcf part1.vcf --plink --out part1", and then plink version 1.9 was used to convert them into plink binary format like "plink --file part1 --make-bed --out part1".
The whole process went smoothly and no error popped out. However, I found that the two bim files generated by plink1.9 differed that some alleles in the last two columns were switched, although the part1 and part2 are originally extract from the same vcf file. For example:
part1.bim: 7 AX-272507051 0 99933869 G A part2.bim: 7 AX-272507051 0 99933869 A G
Is anybody experiencing of that? I'm not sure the problem is caused by vcftools or plink. How could I check the bed file?
Thanks,