I have been trying to follow the instructions on this poist (https://www.biostars.org/p/16844/) to compare three vcf files. Ideally in the end I would like to be able to produce a venn diagram showing the overlapping and unique variants.
I have tried bcftools but it only takes in two files, confirming what the other user (in the post I linked) said. I then tried to run vcf-compare according to the instructions found on the vcftools website (https://vcftools.github.io/perl_module.html#vcf-compare), and I get an output saying:
Unknown parameter or non-existent file "-H".
I don't understand why as the instructions on the official website says to include -H, yet I can't find documentation on what the parameter -H does. I'm wondering if anyone else has had this problem?
Also, I have yet to find a detailed post on the output of vcf-compare, nor can I find it on the official website. I would really appreciate a sample output from somewhere so I can check to see if my command ran successfully. If anyone can provide me with a link, that would be great! Thanks!
I tried something similar before but found no tools suitable for this. I ended up writing my own Perl script (copied below).
It works with two vcf files at a time. If you want to find the intersection between three vcf files, just concatenate two of the vcfs into the same file and then compare this new vcf file against the third one. To concatenate you can do this: cat vcf1.vcf vcf2.vcf > vcf12.vcf
#!/usr/bin/perl -w
use strict;
my $vcf1 = $ARGV[0];
my $vcf2 = $ARGV[1];
my %sites1;
open(I, $vcf1) || die "Cannot read $vcf1\n";
while(<I>) {
next if(/^#/);
my($chr,$site) = (split(/\t/,$_))[0,1];
$sites1{"$chr:$site"} = 1;
}
close(I);
print scalar(keys(%sites1)), " sites in $vcf1\n";
my %sites2;
open(I, $vcf2) || die "Cannot read $vcf2\n";
while(<I>) {
next if(/^#/);
my($chr,$site) = (split(/\t/,$_))[0,1];
$sites2{"$chr:$site"} = 1;
}
close(I);
print scalar(keys(%sites2)), " sites in $vcf2\n";
my $shared = 0;
foreach my $site ( keys %sites1 ) {
if(exists($sites2{$site})) {
$shared++;
}
}
print "$shared shared between $vcf1 and $vcf2\n";