Entering edit mode
2.0 years ago
ej6474
▴
10
I have a really large multi-sample VCF (~500 subjects) for which I am trying to annotate with ClinVar annotations and also identify amongst my subjects. My main questions are:
- How many pathogenic variants I am able to find (particularly in the ACMG 59 genes recommended for reporting)?
- What is the MAF of these pathogenic-annotated variants across my subjects?
What is the best way to do this, in a parallel fashion?
I was thinking of using VEP and the ClinVar flag and maybe splitting up my large, multi-sample VCF into a set of 20 or more regions, annotating those regions in parallel, then combining them back into one multi-sample VCF?
Though, I am not entirely sure how to get the MAF for variants that would be annotated as 'Pathogenic', across all of my subjects? Anyone have any suggestions?