Identifying De Novo Mutations from vcf files
2
1
Entering edit mode
6.3 years ago
jiangqi_1996 ▴ 10

Hi,

I'm new here,now I have three vcf files(father,mother and child),how can I identify de novo mutations from these three files? I am trying GATK PhaseByTransmission,however I don't know how to convert three vcf files to one ped file. Can I make a try to program for the aim?

Thanks.

SNP • 4.4k views
ADD COMMENT
1
Entering edit mode

Check out plink and plink/seq, especially pseq denovo. plink can create a plink ped file from a VCF file.

ADD REPLY
0
Entering edit mode

Thank you for your reply, i will try it.

ADD REPLY
4
0
Entering edit mode

Thanks a lot! Now i am running PossibleDeNovo.

ADD REPLY
1
Entering edit mode
6.3 years ago
Len Trigg ★ 1.6k

Here's how I would do it with RTG Tools. This assumes your samples are named "father", "mother", "son" with their calls contained in block-compressed, tabixed VCFs named father.vcf.gz, mother.vcf.gz, son.vcf.gz respectively:

rtg vcfmerge father.vcf.gz mother.vcf.gz child.vcf.gz \
  --add-header "##PEDIGREE=<Child=son,Mother=mother,Father=father>" \
  --add-header "##SAMPLE<ID=son,Sex-MALE>" \
  --output trio.vcf.gz
rtg mendelian -t /path/to/referencegenome.sdf --input trio.vcf.gz \
  --lenient --output-inconsistent trio-non-mendelian.vcf.gz

Adjust appropriately if your child is female or for what your particular sample names are. The reference genome is used to adjust the Mendelian inheritance rules appropriately for the sex chromosomes, and is created as a one-off process via:

rtg format -o /path/to/referencegenome.sdf /path/to/referencegenome.fasta

(For typical human reference genomes the sex chromosomes will be automatically recognized by the format command)

You should also be aware of the fact that when your samples have been called separately, you can end up with some variation in the representation of variants (particularly complex variants involving indels or several variants in close proximity) in the VCF that may make it look like there is mendelian inconsistency when there actually is not.

The ideal solution to this is to jointly call all the members of your trio at once (preferrably with a pedigree aware caller like those in RTG Core) to ensure the variants are consistently represented across the trio. The next best solution to deal with this is to use a Mendelian comparison tool that is aware of the representation issue, such as VBT. The next best solution is to apply external decomposition and normalization tools (and there are many of these, included in tools such as vt, vcflib, bcftools) to the input VCFs prior to comparison.

ADD COMMENT
0
Entering edit mode

Thank you for your answer, maybe this is another way I can make a try.

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6