Hi Biostars,
I want to analyse de novo missense variants for a hundred genes in context of ASD. I have access to already annotated vcf files by SnpEff for multiple families (trios/quad) but I haven't the summary stat report. I've filtered them with SnpSift to retain only variants included in my genes of interest and now I would like to get those stats report for variants in my genes. Would I be able to do so without rerunning SnpEff?
As I understand it, the genotypes aren't phased :
1 8418644 . C A 45481.5 . AB=0.482143;ABP=6.26751;AC=4;AF=0.500;AN=8;AO=1909;CIGAR=1X;DP=219;DPB=2975;DPRA=0.849183;EPP=216.278;EPPR=166.614;GTI=0;LEN=1;MEANALT=1.03774;MQM=59.924;MQMR=59.9341;NS=61;NUMALT=1;ODDS=9.14088;PAIRED=0.988476;PAIREDR=0.985889;PAO=0;PQA=0;PQR=0;PRO=0;QA=55540;QR=33708;RO=1063;RPP=180.488;RPPR=129.665;RUN=1;SAF=1172;SAP=218.252;SAR=737;SRF=688;SRP=203.139;SRR=375;TYPE=snp;set=variant;technology.illumina=1;EFF=SYNONYMOUS_CODING(LOW|SILENT|cgG/cgT|R1317|1566|RERE||CODING|NM_001042681.1|20|1),SYNONYMOUS_CODING(LOW|SILENT|cgG/cgT|R1317|1566|RERE||CODING|NM_012102.3|21|1),SYNONYMOUS_CODING(LOW|SILENT|cgG/cgT|R763|1012|RERE||CODING|NM_001042682.1|10|1);TRF_score=93;CADD_p=0.014;CADD_score=-1.586562 GT:AO:DP:GQ:QA:QR:RO 0/1:40:77:99:1209:1186:37 0/1:24:54:99:730:962:30 0/1:21:46:99:649:803:25 0/1:20:42:99:608:677:22
Would I need to phase to estimate de novo probabilities? Could I do that with that workflow https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants since it's already annotated by SnpEff?
Thank you,
Maxime