Individual vcf for each sample and single vcf for all samples. Does the output contents differ?
1
0
Entering edit mode
8.6 years ago

Dear All,

I have performed variant calling analysis for 24 samples using GATK pipeline and generated a single VCF with 24 samples in it. I need some clarifications on following things

1) If I generate single VCF file for each of the 24 samples individually and then generate a single VCF file containing all 24 samples,

- Are there any differences between them in the output VCF?

- if yes, what are the differences?

The reason why I am asking this is, I have family level information and also symptom level information for those 24 samples.

Family level information for those 24 samples

  • FamilyA : Sample1, Sample2, Sample3

  • FamilyB : Sample4, Sample5, Sample6

  • ….

  • FamilyH : Sample22, Sample23, Sample24

Symptom level information for those 24 samples

  • Joint pain : Sample1, Sample 4, Sample 14, Sample 15, Sample,16, Sample17

  • Bleeding : Sample2, Sample5, Sample6

  • Symptom X : …..

For instance,

  • I would like to know whether the samples that are grouped together in the above scenario have any common genetic variants among them. In other words, are there 'secondary' variants elsewhere in the exome (other than the X gene) that are common amongst patients that suffer from the same symptoms.

- I want to find common variants for the bleeding symptom, does the common variants differ between the case1 and case2 or not?

case1: I am comparing individual VCF file (sample2.vcf, sample5.vcf and sample6.vcf) and filtering the common variants

case2: I am extracting just the sample2, sample5, and sample6 from the single VCF file with all 25 samples in it

  • As the above example, I would like to find common variants at the family level as well.
VCF SNP variant calling DNASeq RNASeq • 2.3k views
ADD COMMENT
0
Entering edit mode

The differences will be in INFO column (especially with AC, AN etc. tags). The combined VCF will have aggregated statistics for those tags. Other than that, I don't think there would be any differences.

ADD REPLY
0
Entering edit mode

Currently, I am generating the individual vcf files. Once it is complete, I will update you.

ADD REPLY
0
Entering edit mode
8.6 years ago
igor 13k

There is a really nice presentation that covers this specific question very well: http://cbsu.tc.cornell.edu/lab/doc/Variant_workshop_Part2.pdf (relevant part starts at page 18)

ADD COMMENT
0
Entering edit mode

Thanks Igor for the material. It looks great.

ADD REPLY

Login before adding your answer.

Traffic: 1834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6