the Dragen gatk pipeline works great for single sample. however I would like to know if any have used this pipeline for a trio? if so how did you do it? it is recommended to do a hard filtering based on QUAL but how to get a multisample vcf in which the bad quality variants (QUAL <10.43) have been removed?
Either tool can automagically gets the information or user can supply pedigree/population/trios information after generating gvcfs. This is what joint calling workflow does on baseline (llumina), from my understanding. From the info on website:
Joint Calling The workflow will run joint calling of either a pedigree or a population for small variant calling. The inputs to this workflow are single-sample gVCF files generated by the DRAGEN Germline app. The DRAGEN Joint Genotyping app can automatically discover the files it needs to perform the requested analysis, otherwise input files may be individually specified.
Joint Calling (Multi-sample gVCF) The workflow will run joint calling of either a pedigree or a population for small variant calling starting from a multi-sample gVCF generated by a previous run of DRAGEN Joint Genotyping.
Combine GVCFs The workflow will merge single-sample gVCF files into a multi-sample gVCF file without joint calling. This gVCF can be used as input into the DRAGEN Joint Calling.
SV de novo The workflow will run SV joint calling to generate a multi-sample VCF file. The input to this workflow is either BAM or CRAM files from the DRAGEN Germline app.
CNV de novo The workflow will run CNV joint calling to generate a multi-sample VCF file. The input to this workflow is *.tn.tsv files from the DRAGEN Germline app.
Expansion Hunter The workflow will merge ExpansionHunter VCF files from the DRAGEN Germline app.
Input Files
For small variants: gVCFs
For SV: BAMs
For CNV: .tn.tsv files generated by the DRAGEN Germline app
For Expansion Hunter: repeat VCFs
Optional .ped Pedigree File
Output Files
VCF or gVCF containing small variant calls, SV calls, CNV calls, or EH calls.
Known Limitations
The maximum recommended number of input samples for this app is 100
Following may be the correct workflow:
Call gVCFs independently
Merge all the gVCFs
Run joint variant calling. (Tool can detect automatically or user can supply the sample information) to generate a vcf.
Thanks for your answer. But maybe i was not clear enough.
I would like to use the open source pipeline made by gatk and illumina and incorporated into the gatk tool (via the dragen mode option).
It works very well for a single sample but if I produce gvcf for earch sample and that I try to combine them with for example genomicdbimport then genotypegvcf I obtain an aberrant result with in particular a very high number of de novo variants.
the gatk team confirms to me that they have not tested on trios and therefore have no answer.
but it does not matter in the end the use of an open source pipeline is more expensive than using the illumina commercial pipeline in the cloud so you might as well use this pipeline which is very fast and easy to use.
how does that recipe leverage the pedigree information?
Either tool can automagically gets the information or user can supply pedigree/population/trios information after generating gvcfs. This is what joint calling workflow does on baseline (llumina), from my understanding. From the info on website:
Following may be the correct workflow:
joint calling step:
reference URL for workflow: https://basespace.illumina.com/apps/12867855 Needs registration.
reference URL for joint calling: https://support.illumina.com/content/dam/illumina-support/help/Illumina_DRAGEN_Bio_IT_Platform_v3_7_1000000141465/Content/SW/Informatics/Dragen/gVCFJointCallingExamples_fDG_dtREF.htm