I would like to use GATK to obtain the distribution of variants, but I have some calls with low support, for example, I do not trust a call that tells me that it was founded in 2 reads. I can use simple coverage-based filter (say, depth for more than 20 bases is good and less is bad), but I am sure that more efficient strategies exist (that takes into account qualities of this base in reads, etc). Could you tell me how to filter low-quality variants?
It is not for variant calling so I do not care about impact and (I guess) can not use snpSift. I do not need really high accuracy, I just want my distribution of genome-wide variants to be noise-free.
I know that it looks like a newbie question but it is. I am completely new in variant calling.
Hello How and with what scripts can I apply the following filters in a file that includes all variants of the genome? Please explain in detail
Variants with phred-scaled scores below 20 and variants with genotypic qualities (GQ) of less than 20, SNPs within 5 bp of an indel, indels within 10 bp of each other, variants with a depth of coverage below 33% or more than twice mean genome coverage of the alignment
NO. This is not an appropriate way to ask for help - you are demanding help, which makes no one want to give you their time. Also, you've added your question as an answer to a 8 year old question - why did you do that? Did you familiarize yourself with the etiquette of the forum then create your post or just added your post with no regard for the proper way to do anything here?
I'm moving your post to a comment for the moment.