Hello,
I would appreciate some feedback on some sequencing data I'm analysing with SKAT. After a bioinformatician finished processing the data, I was given a .vcf file and VEP annotation file in the tab format containing variants that PASS all filters in gatk.
The dataset contains 1194 subjects and 1 069 064 annotated variants (288547 variants with MAF<0.01). In addition to that, I further filtered variants based on annotations, resulting in:
84474 deleterious missense variants (SIFT != tolerated", "PolyPhen > 0.908); of those 5456 have MAF<0.01;
31611 loss of function variants (stop_gained, frameshift_variant, stop_lost, start_lost, splice_acceptor_variant splice_donor_variant); Of those 1178 have MAF<0.01;
When I create SNP sets for all available genes (combining both loss of function and deleterious missense variants) the number of variants in a gene ranges from only 1 to 50 with a median of 5.
It's my first time working with this type of data, but it seems that ~1mln variants in an exome sequencing dataset is relatively low. My concern is that the sets of deleterious missense variants and loss of function variants will not have enough rare variants for association testing. When running SKATO analyses using larger SNP sets, for example, selecting variants in genes associated with a disorder (ADHD in this case), this set contains 475 variants and of those only 370 are used for association testing (the rest don't show variation in the sample).
Could anyone please comment on the suitability of these data for the analysis with SKATO? Any idea if this dataset is ok and if not, which processing step should be investigated?
Thank you very much!
Kind regards, Aurina
Thanks for the response, I'll have a go at gene-level analyses. Following up on the dataset - would you think that it is worth exploring the data with more lenient filtering criteria (rather than keeping variants that pass all filters)? If so, what could be the appropriate set of filters to expand the dataset while maintaining reasonable noise levels?
Any thoughts are really appreciated.