Hello,
I am currently focusing on identifying denovo mutations from my trio data (parents are unaffected and child is affected). I used PhaseByTransmission. However, I found all denovo mutations (child is heterozygous, and both parents are hom. ref) were not phased (i.e. I am getting '/' instead of '|'). Do you think it is an error? If I search autosomal recessive, variants were phased correctly. What is the problem in my analysis? I am pasting the summary results provided by PhaseByTransmission for your kind perusal. Please also comment on the summary results, are they looking odd?
Please help.
java -jar /gatk_3.3/GenomeAnalysisTK.jar -R /reference_sequence/human_g1k_v37.fasta -T PhaseByTransmission -V trio1.vcf -ped trio1.ped --DeNovoPrior 0.00001 -o trio_out.vcf --MendelianViolationsFile mendelian_violation.vcf
INFO 20:04:04,201 GenomeAnalysisEngine - Strictness is SILENT
INFO 20:04:04,341 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 20:04:04,453 PedReader - Reading PED file trio1.ped with missing fields: []
INFO 20:04:04,457 PedReader - Phenotype is other? false
INFO 20:04:04,510 GenomeAnalysisEngine - Preparing for traversal
INFO 20:04:04,530 GenomeAnalysisEngine - Done preparing for traversal
INFO 20:04:04,531 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 20:04:04,531 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 20:04:04,532 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 20:04:34,824 ProgressMeter - 15:96876611 147844.0 30.0 s 3.4 m 77.5% 38.0 s 8.0 s
INFO 20:04:43,701 PhaseByTransmission - Number of complete trio-genotypes: 139299
INFO 20:04:43,702 PhaseByTransmission - Number of trio-genotypes containing no call(s): 0
INFO 20:04:43,703 PhaseByTransmission - Number of trio-genotypes phased: 124651
INFO 20:04:43,703 PhaseByTransmission - Number of resulting Het/Het/Het trios: 13391
INFO 20:04:43,704 PhaseByTransmission - Number of remaining single mendelian violations in trios: 937
INFO 20:04:43,704 PhaseByTransmission - Number of remaining double mendelian violations in trios: 12
INFO 20:04:43,704 PhaseByTransmission - Number of complete pair-genotypes: 0
INFO 20:04:43,705 PhaseByTransmission - Number of pair-genotypes containing no call(s): 0
INFO 20:04:43,705 PhaseByTransmission - Number of pair-genotypes phased: 0
INFO 20:04:43,705 PhaseByTransmission - Number of resulting Het/Het pairs: 0
INFO 20:04:43,706 PhaseByTransmission - Number of remaining mendelian violations in pairs: 0
INFO 20:04:43,706 PhaseByTransmission - Number of genotypes updated: 4395
INFO 20:04:45,481 ProgressMeter - done 201351.0 40.0 s 3.4 m 100.0% 40.0 s 0.0 s
INFO 20:04:45,482 ProgressMeter - Total runtime 40.95 secs, 0.68 min, 0.01 hours
INFO 20:04:47,002 GATKRunReport - Uploaded run statistics report to AWS S3
Hi Vivek,
I must appreciate your help. Could you please guide me with some additional information?
Thanks a lot.
ReadBackedPhasing and PhaseByTransmission are two entirely different modules based on how they work and they should not necessarily be used together. PBT works by adding a statistical prior before phasing the variants, RBP on the other hand works on constructing haplotype strings by leveraging linkage equilibrium over certain lengths and using reads that span multiple variant sites.
OK...got it...:)
Thank you very much Vivek and Donfreed