Criteria for merging multiple VCF files
1
1
Entering edit mode
5.0 years ago
seta ★ 1.9k

Dear all,

I merged many single VCF files to produce a multi-sample VCF file (human) using bcftools (bcftools merge -m non). Before merging, I split multiallelic sites with bctools, too. However, there are many sites with no genotype (call), just there is ./.. Could you please let me know if shall I apply any filtering step on each single VCF file before merging step?

Thanks

bcftools VCF • 2.2k views
ADD COMMENT
2
Entering edit mode
5.0 years ago
Brice Sarver ★ 3.8k

If you're combining non-filtered VCFs, it would make sense to expect a lot of false positive variant calls that are unique to a single sample, resulting in missing genotypes in other samples. You could avoid or reduce this in a number of ways:

  1. See the GATK's HaplotypeCaller documentation on how to generate and merge gVCFs; individuals that have reference calls will correctly have 0/0 when applicable.
  2. Filter the VCFs before so you only have confidently-called variation, then merge them.
  3. Additionally filter on fixed number or percentage of individuals with missing data in your multisample VCF.
  4. Replace missing genotypes with reference calls in your final VCF (has a potential to be problematic).
ADD COMMENT
0
Entering edit mode

Thanks for your comments. Actually, I have VCF files, not gvcf, that the minimum quality score (5th column of VCF file) is 30. I did a basic filtering on single VCF file to keep all variants (SNP and Indel) with the minimum DP of 10 and GQ of 20; then I'm going to merge all filtered single VCF files again followed by keeping just variants present in 80% of individuals within the merged (multi-sample) VCF file. Could you please share me your idea? is this filtering enough? Please kindly tell me if the additional filtering is needed.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6