Dear all,
I've been running pindel on a set of BAM files. I've then converted files to vcf using pindel2vcf utility.
Which rules would apply to tell if a variant is likely to be present in a sample (or sample set)? I mean, if I have a variant with 10 reads in one sample and 2 reads in another (both high coverage genomes, >40x) would you trust the call in the second sample? How do you filter pindel results?
in general you can use the number of samples and the number of supporting reads to rank calls. the more supporting reads, the more reliable. to determine whether a certain call in a given sample, presence in other samples and sufficient number of reads from both strands is preferred, if you like to have high specificity. for better sensitivity, you may even go down to 1 read, if other samples have solid evidence.