Hi!
I got my .vcf files after doing the variant calling with GATK Haplotype caller.
I am new to PLINK, I would like to know how to get a set of PLINK files (.ped, .map) from the vcf file for somatic cells. So far I used the following:
plink --vcf file.vcf --recode --out PLINKfile
But then in the .ped file I have information only about one of the alleles:
person1 person1 0 0 0 -9 G A G C C C ...
person2 person2 0 0 0 -9 G A G C T C ...
person3 person3 0 0 0 -9 G T C C C C ...
As I understand, for SNPs it should have 2 letters at each position, one for each allele, so it should look like this:
person1 person1 0 0 0 -9 GA AA GG CT CC CC ...
person2 person2 0 0 0 -9 GA AA GG CC TC CC ...
person3 person3 0 0 0 -9 GA TA CG CT CC CC ...
How do I do that? Also, is there a way to encode deletions and insertions, especially if they are longer than 1 nucleotide?
Thank you
ped/map is a very outdated and generally poor and memory inefficient way to store data. Would reccomend using another format if you can.