I am analyzing variant calling on RNASeq data using somatic pipeline of mutect2 function in gatk. To fileter the resulting variants, I need to filter out low reads in a way that for example the sum of reads from ref and alt be more than 50. Like here:
0/1:4,21:25:64:615,0,64
4+21<50 so this variants needs to be omitted. Do you have any idea how to do it?
Also I need to remove noncoding variants from this vcf file as it is coming from RNaseq data. I need a bedfile including exonic region and coding regions to be able to filter out noncoding variants. Do you know where can I get such a file? I tried the UCSC website but I could not figure out the position criteria to get such a bed file.
Thanks Pierre, I will try that. Do you think the above threshold can replace the filtering function from picard :
for the second part of my question, how should I do this filtering? I tried using the code below
replacing
'transcript_type
"protein_coding"' with 'exon "transcript"'
but it didnt work. Do you think the above code will output the coding region wwithout specifying the exon/transcripts?