Entering edit mode
4.5 years ago
emma.berdan
•
0
Hi,
So I have called GVCF files using GATK (v4.1.0.0) and since CombineGCVFs does not work for me I am using BCFtools merge. However, this changes all missing genotypes to 0/0 even though I am not using the --missing-to-ref tag. This messes up all the downstream analyses. Any idea how to change this? My current bcftools code is:
bcftools merge -O v -g cd_hit_95.fasta\
P16M.output.g.vcf.gz \
P1M.output.g.vcf.gz \
P13F.output.g.vcf.gz \
P3F.output.g.vcf.gz \
P16FD.output.g.vcf.gz \
SKAA1.output.g.vcf.gz \
....more variant files.....
-o RAW_bcf.vcf
Thanks!
why ? That should be your only question. What was the command line ? what was the error message.
Don't. Bcftools is not designed to merge gatk gvcf blocks.
For me CombineGCVFs never produces and output even with extremely long run times and large amounts of memory. This seems to be a common problem: https://gatkforums.broadinstitute.org/gatk/discussion/11811/seems-like-combinegvcfs-is-freezing
I have some limitations as I am working with a transcriptome so I have a large number of contigs (i.e. scaffolds) but not a huge number of base pairs. This limits some of the other tools that seem like they might work.
do you split your CombineGCVF jobs per contig ?
do you split your CombineGCVF jobs per no-more-than ~200 samples ?
No, all the GVCF files are per sample. There are 65k contigs so this is not feasible. Artificial scaffolds are not recommended for RNA seq.
and there are 44 samples total.
I don't get this. Why can't you run CombineGVCF for each contig in parallel ?
I don't quite understand. Do you mean splitting each GCVF by contig and then running Combing GVCF and then combining all of those outputs?
yes. And I think you wan't to call the variants ? so you can call GenotypeGVC per contig and concatenate the 65K vcf files at the end with picard/GatherVcf
oh and you don't have to really split the gvcf files. Just tell gatk which contig you want to use with option
-L