HI all,
I was wondering if someone could help. What would be the best way to compare two VCF files? I was thinking of using BED tools but not sure if there is anything more advanced out there which can output statistics as well.
There is a program called vcf-compare in vcftools for comparing files. That's probably a good start, though you'll have to be more specific if that is not exactly what you want (vcf-stats may also be of interest).
Try vcftools. You will have to use tabix and bgzip that comes with tabix. These tools are basically for compressing and indexing the vcf file. It has to be done beofre using vcf tools.
I wrote a tool named vcf2sqlite: it put one ore more VCFs into a local sqlite3 database. The schema is simple and you can query the database using some simple SQL statements.
$ sqlite3 -column -header db.sqlite \
"select SAMPLE.name,VCFCALL.value,count(*) from VCFCALL,SAMPLE where SAMPLE.id=VCFCALL.sample_id and prop='GT' group by SAMPLE.id,VCFCALL.value"
name value count(*)
----------- ---------- ----------
rmdup_1.bam 0/1 545
rmdup_1.bam 1/1 429
rmdup_2.bam 0/1 625
rmdup_2.bam 1/1 349
rmdup_3.bam 0/1 595
rmdup_3.bam 1/1 379
rmdup_4.bam 0/1 548
rmdup_4.bam 1/1 426
rmdup_5.bam 0/1 564
rmdup_5.bam 1/1 410
rmdup_6.bam 0/1 724
rmdup_6.bam 1/1 250
There is a program called vcf-compare in vcftools for comparing files. That's probably a good start, though you'll have to be more specific if that is not exactly what you want (vcf-stats may also be of interest).