Entering edit mode
9 months ago
MaYo
•
0
Hi everyone,
I merged about 3500 VCF files using bcftools [bcftools merge -m none -o merged.vcf.gz my_data/*.vcf.gz]. The merge seemed successful, but before proceeding, I checked for related individuals using PLINK's --genome and KING's kinship. Both indicated that all of my data are first-degree relatives, which seems extreme.
I suspect there might be an issue with the merge, but I'm unsure how to identify and fix it. Any advice would be greatly appreciated!
Thanks, Maayan
Are these gVCF files as mentioned in the title or VCF files? In my understanding, if you merged gVCFs - having an entry for each genomic position, variable or not - the output would be a gVCF again. Possibly, this is not expected by the other tools. bcftools merge has the
-g --gvfc
to merge the gVCF blocks. Maybe this helps. If that does not work, maybe you could use GATK GenomicsDBImport -> GenotypeGVCFs instead, if the variants were called via HaplotypeCaller.thanks for the response! i double checked thanks to you - the are regular vcf files, and i still have no idea what is my problem...
Now it gets really difficult to "debug". I sometimes find it hard to predict what will happen when merging, intersecting, or otherwise joining the files. Some ideas:
-m, --merge snps|indels|both|snp-ins-del|all|none|id
or also applying bcftools norm to normalize indels.If you look at the original VCF files, are "0/0" homozygous-REF genotypes present? Or do you have "variant-only" VCF files?
It sure sounds like you have the latter. Unfortunately, that implies you need to backtrack and generate individual VCF files that do contain homozygous-REF genotypes.
Thank you, just checked and you are right! I have variant-only data in my files, no 0/0. I guess there is a way to change ./. values in the merged files into 0/0, but that is problematic since this could be wrong... So I am not sure what to do with it.
Also, can you help me understand why these files show such strong kinship?
Thanks again!