Entering edit mode
11 months ago
Michal Frenkel
▴
30
I'm attempting to combine around 140 .g.vcf files into a single file using GLnexus on the DNAnexus platform. To examine multiallelic variants, I'm normalizing the files using the bcftools norm -m-any $file
command. While merging the original VCF files (generated with GATK) poses no problem, the normalized VCF files result in no detected variants. Can you provide insights into why this might be happening and suggest possible solutions? Additionally, what happens to multiallelic variants when merging without normalization?
show us some lines of the normalized files
here is example of some lines from the normalized VCF. I checked and there are lines that the 4th column is not <non_ref> but didn't mange to copy them.
Can you find a line that changed after the bcftools step?
I found an example. In the original .g.vcf:
In the normalized .g.vcf it looks like this:
oooh that might be a bug - it's treating the <NON_REF> sentinel as kind of an allele change. Really that line should not be touched unless it was truly multiallelic (e.g. A->C, A->G), and any reference range should include an END.
Can you file an issue here https://github.com/samtools/bcftools/issues and include the example above and the version number of bcftools?
Good catch.