Dragen-gatk for trio
2
0
Entering edit mode
3.0 years ago
quentin54520 ▴ 120

Hi everyone,

the Dragen gatk pipeline works great for single sample. however I would like to know if any have used this pipeline for a trio? if so how did you do it? it is recommended to do a hard filtering based on QUAL but how to get a multisample vcf in which the bad quality variants (QUAL <10.43) have been removed?

Thanks in advance,

Quentin.

Gatk germline dragen • 2.2k views
ADD COMMENT
1
Entering edit mode
3.0 years ago

I would like to know if any have used this pipeline for a trio?

you map each bam and generates the GVCF files on the fly:

              (...)
            --enable-variant-caller true \
        --vc-emit-ref-confidence GVCF \

and then the 3 GVCFs are merged:

dragen -r refdir \
    --dbsnp dbsnp.vcf.gz \
    --output-directory OUT \
    --output-file-prefix prefix \
    --enable-joint-genotyping true \
    --variant s1.g.vcf.gz --variant s2.g.vcf.gz  --variant s3.g.vcf.gz 

but how to get a multisample vcf in which the bad quality variants (QUAL <10.43) have been removed?

this is post-processing, with bcftools for example

ADD COMMENT
0
Entering edit mode

how does that recipe leverage the pedigree information?

ADD REPLY
0
Entering edit mode

Either tool can automagically gets the information or user can supply pedigree/population/trios information after generating gvcfs. This is what joint calling workflow does on baseline (llumina), from my understanding. From the info on website:

    Joint Calling The workflow will run joint calling of either a pedigree or a population for small variant calling. The inputs to this workflow are single-sample gVCF files generated by the DRAGEN Germline app. The DRAGEN Joint Genotyping app can automatically discover the files it needs to perform the requested analysis, otherwise input files may be individually specified.
    Joint Calling (Multi-sample gVCF) The workflow will run joint calling of either a pedigree or a population for small variant calling starting from a multi-sample gVCF generated by a previous run of DRAGEN Joint Genotyping.
    Combine GVCFs The workflow will merge single-sample gVCF files into a multi-sample gVCF file without joint calling. This gVCF can be used as input into the DRAGEN Joint Calling.
    SV de novo The workflow will run SV joint calling to generate a multi-sample VCF file. The input to this workflow is either BAM or CRAM files from the DRAGEN Germline app.
    CNV de novo The workflow will run CNV joint calling to generate a multi-sample VCF file. The input to this workflow is *.tn.tsv files from the DRAGEN Germline app.
    Expansion Hunter The workflow will merge ExpansionHunter VCF files from the DRAGEN Germline app.

Input Files

    For small variants: gVCFs
    For SV: BAMs
    For CNV: .tn.tsv files generated by the DRAGEN Germline app
    For Expansion Hunter: repeat VCFs
    Optional .ped Pedigree File

Output Files

    VCF or gVCF containing small variant calls, SV calls, CNV calls, or EH calls.

Known Limitations

    The maximum recommended number of input samples for this app is 100

Following may be the correct workflow:

  1. Call gVCFs independently
  2. Merge all the gVCFs
  3. Run joint variant calling. (Tool can detect automatically or user can supply the sample information) to generate a vcf.

joint calling step:

dragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
--enable-joint-genotyping true \
--output-directory /staging/examples/ \
--output-file-prefix Joint_SRA056922_30x_e10_50M \
--variant /staging/examples/SRA056922_30x_e10_50M.gvcf.gz

reference URL for workflow: https://basespace.illumina.com/apps/12867855 Needs registration.

reference URL for joint calling: https://support.illumina.com/content/dam/illumina-support/help/Illumina_DRAGEN_Bio_IT_Platform_v3_7_1000000141465/Content/SW/Informatics/Dragen/gVCFJointCallingExamples_fDG_dtREF.htm

ADD REPLY
0
Entering edit mode
2.9 years ago
quentin54520 ▴ 120

Thanks for your answer. But maybe i was not clear enough. I would like to use the open source pipeline made by gatk and illumina and incorporated into the gatk tool (via the dragen mode option). It works very well for a single sample but if I produce gvcf for earch sample and that I try to combine them with for example genomicdbimport then genotypegvcf I obtain an aberrant result with in particular a very high number of de novo variants. the gatk team confirms to me that they have not tested on trios and therefore have no answer.

but it does not matter in the end the use of an open source pipeline is more expensive than using the illumina commercial pipeline in the cloud so you might as well use this pipeline which is very fast and easy to use.

Quentin.

ADD COMMENT

Login before adding your answer.

Traffic: 2305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6