I have a VCF file containing SNPs called from a trio (two parents and one child). I was wondering what format should I follow for the PED file to input into GATK PhaseByTransmission?
The GATK/PLINK forums list the following as essential columns:
Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype
So I created a simple PED file (input.ped
) as follows:
F1 P 0 0 1 1
F1 M 0 0 2 1
F1 H1a P M 1 1
F1 H1b P M 1 1
Do I need to follow any convention when naming my samples in my input.vcf
when I run the following:
java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T PhaseByTransmission \
-V input.vcf \
-ped input.ped \
-o output.vcf
The question is about a child and two parents??