Variant calling is a common strategy to analyze genome differences but in many cases the tools used produce outputs that aren't formatted with the standard VCF format. Two examples are bcftools isec
and the show-snps
utility from the MUMmer package.
My tool is a toolkit to convert these non-standard outputs to VCF. For now, it can convert the output of bcftools isec to VCF or the output of show-snps -T to VCF. I plan to add other non-standard formats in the future, based on needs and suggestions.
all2vcf isec
This utility processes the output of bcftools isec. The latter command is used to intersect VCF files (provided as input) and provides a series of output files, among which a file called sites.txt
that represents the intersection sites of the input VCFs. This file is not in VCF format, which means it's hard to use it in a genome browser. Sometimes however one wants to see the intersected variants in a graphical interface. Hence, with all2vcf isec
you can convert this file to a VCF file retaining some of the most relevant information in it. Of course you will lose every sample-specific info such as genotype, coverage, MQ0F and other VCF-related numbers. Those you can always look up on the original files. You will, however, be able to see the shared variants between multiple VCF files in a standard VCF-format.
all2vcf mummer
This utility processes the output of nucmer | show-snps -T. The latter command is used on the output of nucmer or MUMmer, which is a delta
file. By using show-snps -T *.delta
one can obtain the SNPs from the delta mapping file. However, the result is a tab-separated file that has no standardization and requires manual handling to extract information from it. With all2vcf mummer
you can directly convert this format to a VCF standard file, and then analyze it with existing VCF analysis tools. As for the isec
utility, you will lose every sample-specific info but you can always look that up on the original files.
You can clone the tool from its Github Repository: https://github.com/MatteoSchiavinato/all2vcf
I hope it will be helpful for many people! And since it's relatively new, if you find issues with it please open them and let me improve it :)
Hi, I am trying to use bcftools isec for comparing .vcf files but getting the error could not pars. I used the following code bcftools isec -n +2 RV_1.vcf.gz >RV_2.vcf.gz
I would truly appreciate the help. Thanks