Hello,
I have a sequencing in ION PGM.
We sequenced 96 barcodes (individuals) and 310 amplicons (chromosomal regions).
32 barcodes are controls and 64 are cases.
We did the variant calling and get 96 VCF files.
We combine them in a single VCF file using GATK. We have 90 different SNPs in the sample.
We convert the single VCF file to plink format (map and ped files) using vcftools.
Now we want to use plink to make a association test,
the ped file looks like this
1 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0
2 2 0 0 0 1 0 0 0 0 0 0 0 0 T C
3 3 0 0 0 2 0 0 0 0 0 0 0 0 T C
4 4 0 0 0 2 0 0 C T 0 0 0 0 T C
5 5 0 0 0 1 0 0 0 0 0 0 0 0 0 0
6 6 0 0 0 2 0 0 0 0 0 0 0 0 0 0
7 7 0 0 0 2 0 0 0 0 0 0 0 0 0 0
8 8 0 0 0 1 0 0 0 0 0 0 0 0 T C
9 9 0 0 0 2 0 0 0 0 0 0 0 0 T C
10 10 0 0 0 2 0 0 0 0 0 0 0 0 T C
11 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0
12 12 0 0 0 2 0 0 C T 0 0 0 0 T C
You can see that there are a lot of missing genotypes, i would like to know what's the standar in this case?
assume that the missing genotypes are references? because most of them probably are, and other could be missing data, but we can't know that, only checking the bam file i guess
If assume the Missing as reference, is there any command in plink to add them automatically?
thanks
Cristian