I have performed SNP calling using Samtools Mpileup thenh filtered by D100 coverage and Freebayes using samtools BAQ -E samtools calculation before hand on tomato resequencing. I am aiming to identify SNPs of these two genomes against a reference and also filter out those that they have in common against the reference into a sep file (found how to do this part on this website posted elsewhere).
When I filter the SNPs from a 3.2gb freebayes file I have 750Mb and when I filter for indels I have another file 750Mb but what am I filtering out of the 1.7 gb lost or is this just extra columns removed? if I have take both SNPs and Indels, I thought that's all it found?
It appears as the two files filtered using the --keep-only-indels and --remove-indels in the vcf toolkit have the same contents which are mixed so may have to find alternative method of seperating SNPS and Indels?
Could someone also recommend what other filters to use on the SNPs file or if it is recommended to filter further? I was going to remove those SNPs from freebayes that have coverage above 100 like samtools does.
If people want the command lines I have used and VCF tools commands for reference I can post.
Rob
You might want to break your sentences up a bit, it's hard to understand what you've done and what you're asking. If I've understood correctly, you've called variants using SAMtools and Freebayes. What I'm less sure about is whether you've submitted one genome to SAMtools and the other to Freebayes or if you've done both using each variant caller. Please clarify. Give a chronology of all the steps you took, one for SAMtools, one for Freebayes. Clarify what your aim is.