I have used GATK's pipeline (RNA-seq alignment) and obtained a VCF file using Mutect2 and FilterMutectCalls. Filters such as PASS, Clustered_events, germline, weak-evidence, etc. were added to the variants. Before annotating the file with dbSNP, COSMIC, ANNOVAR, I would like to filter out significant somatic mutations into a separate file in order to facilitate easier analysis. Is it a good practice to exclude all the variants whose filter is not 'PASS'? Also, in case of deciding whether the variant is germline, is the presence of 'germline' filter alone sufficient or it is better to set a threshold to the GERMQ score?
Or is it better to filter out variants after annotating the files?
A lot of the terms you use here don't seem to be global terms. Can you please edit your question and explain what these terms mean? You can look at the
##INFO
fields to understand them, except forPASS
, which you will need to look at##FILTER
.Also, please add some details on your experiment design - if there were matched normals, panel of normals, etc.
I have aligned RNA-seq tumour samples using STAR and I am following GATK's best practices for variant calling, in order to identify somatic mutations causing cancer. Since I do not have a normal sample, I used Mutect2 with only the required arguments.
I am encountering these filters (PASS, Clustered_events, weak-evidence) for the first time and I am reading the following paper about Mutect filtering for reference:
https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf
Apparently, FilterMutectCalls labels variants which are false positives with a list of failed filters and true positives with PASS.