Entering edit mode
6.8 years ago
Picasa
▴
650
Hi,
I am working on a classic variant calling on a non model organism.
I have 250 samples and followed the GATK best practice.
I have produced 250 g.vcf with HaplotypeCaller, now the next step is to combine those g.vcf and produce a .vcf (with GenotypeGVCFs) either:
A) Solution A: using GenomicsDBImport
B) Solution B: using CombineGVCFs
But those methods are super slow.
I am wondering if it is possible to produce one vcf per g.vcf with GenotypeGVCFs (quite fast) and then combine the 250 vcf with an another program ?
Does it produce the same result ? Thanks.
CatVariants
should help with thecombine
part of your workflow, I think.split by chromosome.
I am working on a non model organism, with a genome that have been assembled. Unfortunately, this is quite fragmented. Is is still worth do it ?
if there are 1000 contigs and you can run 1000 CombineGVCFs jobs in parallel, then it will be 1000 times faster..
Hi Pierre. Do you split by chromosome for combineGVCFs or GenomicsDBimport ?
yes .
should also work for genomicsDBimport though :)