PED file format for use with GATK PhaseByTransmission
2
1
Entering edit mode
9.6 years ago

I have a VCF file containing SNPs called from a trio (two parents and one child). I was wondering what format should I follow for the PED file to input into GATK PhaseByTransmission?

The GATK/PLINK forums list the following as essential columns:

Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype

So I created a simple PED file (input.ped) as follows:

F1      P      0       0       1       1
F1      M      0       0       2       1
F1      H1a    P       M       1       1
F1      H1b    P       M       1       1

Do I need to follow any convention when naming my samples in my input.vcf when I run the following:

java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T PhaseByTransmission \
   -V input.vcf \
   -ped input.ped \
   -o output.vcf
GATK SNP next-gen-sequencing • 5.6k views
ADD COMMENT
0
Entering edit mode
8.9 years ago
ebrown1955 ▴ 320

Unfortunately PhaseByTransmission will only work on trios. What you'll have to do is run the PBT twice, one "trio" for each child.

If I'm not mistaken, Beagle 4.0 currently accepts ped files with multiple trios and will output a VCF file with all phased genotypes in one pass.

ADD COMMENT
0
Entering edit mode

The question is about a child and two parents??

ADD REPLY
0
Entering edit mode
8.9 years ago
Len Trigg ★ 1.6k

Pedigree-aware variant calling is one of the strengths of the Real Time Genomics commands available as part of RTG Core.

You can run simple pedigree-based phasing by transmission on an existing VCF call set using an expert option of the rtg mendelian tool, e.g:

rtg mendelian -t ref.sdf --pedigree input.ped --input input.vcf --output output.vcf --Xphase

Which will phase all offspring calls where possible.

If you have the option to re-run the variant calling itself, you can use rtg family (or rtg population if you have mixtures of families, multi-generation pedigree, and unrelated samples), which will perform pedigree-aware variant calling. The benefit is that the pedigree actually informs the Bayesian variant calling itself, you automatically get pedigree-phased calls in the output, and marking of de-novo variants.

ADD COMMENT

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6