bcftools merge --missing-to-ref
0
0
Entering edit mode
4.5 years ago

Hi,

So I have called GVCF files using GATK (v4.1.0.0) and since CombineGCVFs does not work for me I am using BCFtools merge. However, this changes all missing genotypes to 0/0 even though I am not using the --missing-to-ref tag. This messes up all the downstream analyses. Any idea how to change this? My current bcftools code is:

bcftools merge  -O v  -g cd_hit_95.fasta\
P16M.output.g.vcf.gz \
 P1M.output.g.vcf.gz \
 P13F.output.g.vcf.gz \
 P3F.output.g.vcf.gz \
 P16FD.output.g.vcf.gz \
 SKAA1.output.g.vcf.gz \
....more variant files.....
-o RAW_bcf.vcf

Thanks!

snp • 2.7k views
ADD COMMENT
0
Entering edit mode

since CombineGCVFs does not work for me

why ? That should be your only question. What was the command line ? what was the error message.

I am using BCFtools merge.

Don't. Bcftools is not designed to merge gatk gvcf blocks.

ADD REPLY
0
Entering edit mode

For me CombineGCVFs never produces and output even with extremely long run times and large amounts of memory. This seems to be a common problem: https://gatkforums.broadinstitute.org/gatk/discussion/11811/seems-like-combinegvcfs-is-freezing

I have some limitations as I am working with a transcriptome so I have a large number of contigs (i.e. scaffolds) but not a huge number of base pairs. This limits some of the other tools that seem like they might work.

ADD REPLY
0
Entering edit mode

do you split your CombineGCVF jobs per contig ?

ADD REPLY
0
Entering edit mode

do you split your CombineGCVF jobs per no-more-than ~200 samples ?

ADD REPLY
0
Entering edit mode

No, all the GVCF files are per sample. There are 65k contigs so this is not feasible. Artificial scaffolds are not recommended for RNA seq.

ADD REPLY
0
Entering edit mode

and there are 44 samples total.

ADD REPLY
0
Entering edit mode

There are 65k contigs so this is not feasible.

I don't get this. Why can't you run CombineGVCF for each contig in parallel ?

ADD REPLY
0
Entering edit mode

I don't quite understand. Do you mean splitting each GCVF by contig and then running Combing GVCF and then combining all of those outputs?

ADD REPLY
0
Entering edit mode

yes. And I think you wan't to call the variants ? so you can call GenotypeGVC per contig and concatenate the 65K vcf files at the end with picard/GatherVcf

ADD REPLY
0
Entering edit mode

oh and you don't have to really split the gvcf files. Just tell gatk which contig you want to use with option -L

ADD REPLY

Login before adding your answer.

Traffic: 1844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6