Best way to create a multi-sample VCF
2
1
Entering edit mode
7.9 years ago
Michael ▴ 10

Hello,

I have about 150 bacterial whole genome sequences that I would like to use to create a multi-sample VCF for downstream analysis. I am using BWA to map to the reference genome and then use Pilon to do the variant calling to produce individual VCF files. I am then merging the individual files using bcf tools merge to create a multi-sample VCF. The problem is that positions there are a large number of no calls in the multi-sample vcf because not every strain has a call at every position.

Any suggestions for a better way to create the multi-sample VCF? Thanks in advance!

Michael

SNP • 6.2k views
ADD COMMENT
0
Entering edit mode

Hi Michael,

You might not know markdown, but adding a tab/four spaces before a text block creates "code" layout, which was rather annoying for your post. Fixed that!

Cheers, Wouter

ADD REPLY
1
Entering edit mode
7.9 years ago

I think GATK CombineVariants could help you out here.

ADD COMMENT
1
Entering edit mode
7.9 years ago
cmdcolin ★ 4.0k

You can try to do joint variant calling from multiple samples at once. This can be more robust than individually calling variants on each sample. Not sure if pilon supports that, it looks like it addresses some more complicated variants like structural and assembly issue related things.

Some links

http://gatkforums.broadinstitute.org/gatk/discussion/3893/calling-variants-on-cohorts-of-samples-using-the-haplotypecaller-in-gvcf-mode

http://gatkforums.broadinstitute.org/gatk/discussion/3686/why-do-joint-calling-rather-than-single-sample-calling-retired

ADD COMMENT

Login before adding your answer.

Traffic: 1645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6