My VCF does not contain pedigree info and as such when running PLINK
, it does not compute.
VCFtool
can convert VCF to .ped
and .map
but my understanding is that it does not require a pedigree for the conversion and assumes all individuals as unrelated (?).
I googled around but did not find any tool that can add the pedigree info ("PLINK-grade" pedigree info).
Any suggestion? Great many thanks.
That isn't strictly correct - the VCF specification does describes the PEDIGREE header tag for storing inter-sample relationships. The RTG joint variant callers that utilize inter-sample relationships (e.g. tumor-normal calling, or germline calling with families or larger pedigrees) do output these headers and it can be very useful to check from a header the exact pedigree which was used during calling (in larger pedigrees it is fairly common to discover/correct errors in pedigree). RTG includes subcommands
pedfilter
andpedstats
that let you do simple conversion between the VCF representation of pedigree and PED files, although I am not sure whether these meet the need of the OP for use withPLINK
.Thank you for that piece of information. I have not seen a VCF store pedigree data, so this is new to me. Can you maybe show me the formatting used by the RTG callers for the pedigree info in the header? Also, is there any option for GATK or samtools to include pedigree info in VCF? I for one would not mind having that option.
For a run of
rtg somatic
, the information regarding the tumor and normal samples is represented like this (here a run of a dream challenge dataset):When sample sex information is available (e.g. as used by our sex-aware variant calling), the sex is stored in the SAMPLE header. So, for something like an octet from the CEPH pedigree when called with
rtg population
it looks like:And to convert to PED:
and here is round-tripping to a minimal VCF header:
Sorry, I don't know whether GATK or samtools support this type of thing. (I would just say use our callers instead :-))
Just now some googling indicated that for the upcoming VCF 4.3 spec the format may change slightly, but it doesn't look like a biggie: https://github.com/samtools/hts-specs/issues/96
Thank you. I think GATK and samtools will support this if it becomes part of the specs. Until then, I guess I'll have to store pedigrees in PED files :)