Use of GenotypeGVCFs in population genetic studies
0
0
Entering edit mode
3.2 years ago
kk.mahsa ▴ 150

I have 16 whole genome sequenced samples from two populations (8 for each population). My goal is detection of signature of selection and introgression. I performed read cleaning, mapping to reference, mark duplication. SNP calling was performed using HaplotypeCaller in GATK for each sample separately. Now my question: For downstream analysis (PCA, ADMIXTURE analysis and detecting signature of selection), do I need to use GenotypeGVCFs command in GATK for genotype joining? Or I can create one VCF file per sample separately (without GenotypeGVCFs) and merge them for downstream analysis after variant filtering?

Thanks in advance

WGS Introgression GenotypeGVCFs ADMIXTURE GATK • 2.3k views
ADD COMMENT
1
Entering edit mode

Use GenotypeGVCFs file for post analysis.

The key difference between a regular VCF and a gVCF is that the gVCF has records for all sites, whether there is a variant call there or not. The records in a gVCF include an accurate estimation of how confident we are in the determination that the sites are homozygous-reference or not.

ADD REPLY
0
Entering edit mode

Thanks tothepoint, Now, if i produce separate VCF file for each sample, how can i merge them? merging must be based on population?

ADD REPLY
0
Entering edit mode

Merge or combine vcf file? GATK combinegvcfs will do the job for you.

ADD REPLY
0
Entering edit mode

You should produce a gVCF for each sample (using haplotypecaller in GVCF mode) then (EDITED) combine them in order to run GenotypeGVCFs on all of them together.

(Edited to correct a mistake)

ADD REPLY
0
Entering edit mode

vdauwera please correct me if I am wrong. We can run GenotypeGVCFs after CombiningGVCFs. I did perform some analysis following the explanation in GATK page describing:

The GATK4 GenotypeGVCFs tool can take only one input track. Options are 1) a single single-sample GVCF 2) a single multi-sample GVCF created by CombineGVCFs or 3) a GenomicsDB workspace created by GenomicsDBImport. A sample-level GVCF is produced by HaplotypeCaller with the -ERC GVCF setting.

ADD REPLY
0
Entering edit mode

Oh I misread that as CombineVCFs (without the G), sorry. Yes you’re correct. I would recommend using the GenomicsDB method (that’s what I had in mind, realizing now I didn’t write it out — need coffee) rather than the basic combiner tool, but both are valid.

I edited my previous post to minimize confusion if someone else sees this thread.

ADD REPLY

Login before adding your answer.

Traffic: 2914 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6