Entering edit mode
10 months ago
8armed
▴
20
I obtained raw variant calls by running gatk's GenotypeGVCFs on a combined vcf-file at default:
gatk GenotypeGVCFs -R REFERENCE.ASSEMBLY.fa --variant combined.vcf -output combined_RAW.variants.vcf.gz
I then ran vcftools to filter these raw variants:
vcftools --gzvcf combined_RAW.variants.vcf.gz --out combined_filtered.variants.vcf --recode --recode-INFO-all --remove-indels --minDP 5 --mac 5 --minGQ 20 --minQ 30 --max-missing 0.8
I then get the following warnings. I tried an older gatk version and the warnings remained. Are these warnings worrying, and if so, how to fix them?
Using zlib version: 1.2.13
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 320 out of 320 Individuals
Outputting VCF file...
After filtering, kept 13713 out of a possible 441686 Sites
Run Time = 113.00 seconds
VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009
vcftools is no longer actively maintained. Switch to bcftools please.
But it seems that not all filtering options are supported by bcftools. At least, I cannot find all of them. The full set of genotype and site filters that I'm using from vcftools in this and subsequent analyses are: --remove-indels --not-chr --minDP --max-meanDP --mac --maf --minGQ --minQ --max-missing Are these all available in bcftools?
Overall, it seems that the filtering is working but I just get these errors...
First off, they're not errors, they're warnings. Your output should probably be fine.
To translate vcftools command to bcftools, you will need to read through the bcftools documentation and figure out how to translate the logic, not just look into equivalent flags. Options for
--not-chr
and--remove-indels
seem pretty apparent from my cursory glance, did you look intobcftools view
options at all?--remove-indels --not-chr --minDP --max-meanDP --mac --maf --minGQ --minQ --max-missing Are these all available in bcftools?
yes, see the
-i
argument. https://samtools.github.io/bcftools/bcftools.html#expressionsOk, great. I will take a closer look. I just talked to some colleagues though and it seems that almost all of them are still using vcftools. So it seems to be quite commonly used still. So getting a sense of how severe the above warnings are could still be useful to many.
It's got some legacy use, sure, but it will not keep up with newer VCF versions. Switch to bcftools before you end up introducing silent errors to data that you can no longer go back and correct.