Hi everyone,
I am trying to use vcftools (0.1.16) to filter a vcf file using the following command:
F1="--max-missing 0.5 --mac 3 --minQ 30 --remove-indels"
vcftools --vcf variants_input.vcf --out variants_output.F1 $F1 --recode-INFO-all --recode-bcf
With that, I get my output "variants_output.F1.recode.bcf" and this report:
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
After filtering, kept 27 out of 27 Individuals
Outputting BCF file...
After filtering, kept 10473667 out of a possible 598905538 Sites
Run Time = 6576.00 seconds
It seems to me the warnings are not important, and the filtering actually worked given the report. However, when I try to use bcf stats (bcftools 1.9) on my new output, I get a message telling me that the BCF is corrupted or something and that it cannot be stated
bcftools stats variants_output.F1.recode.bcf > variants_output.F1.recode.bcf.stats.txt
[E::bcf_record_check] Bad BCF record - shared section malformed or too short
Is this caused by the warnings, or something else? I also read somewhere else on the forum that vcftools should not be used anymore but without giving a reason... is it deprecated?
Any help would be greatly appreciated.
Hello yvanpapa ,
there is no active development in
vcftools
so I would call it deprecated. Usebcftools
instead.Could you please show the header and the first few variants of your vcf file before you filter?
bcftools
is very strict about the vcf specification. So we have to make sure you have a valid vcf file.fin swimmer
Hi finswimmer,
Thanks a lot for the fast answer.
According to https://github.com/vcftools/vcftools/issues/134 the warning is "just a warning that vcftools doesn't know how to handle the comma within the Description tag. If you remove that comma in the description, the warning will go away. Otherwise, it can generally be ignored." Although I don't find it practical in a routine pipeline to remove manually those comas every time.
I doubt the vcf is not valid because I had no problems using
bcftools stats
on vcf and bcf files before usingvcftools
.I guess I will perform the filtering with
bcftools
instead. I am pretty new to this but it seems to me it provides the same filtering tools anyway?What? It is absolutely usual that there are commas in the description tag. If
vcftools
cannot handle it, that's a bug.You can use
bcftools
for filtering nearly everything. Have a look at the several options inbcftools view
.fin swimmer