I'm new to genome analysis, so apologies if this is a basic question. My hypothesis is that in a particular disease, gene XYZ contains somatic mutations. I have performed targeted sequencing of gene XYZ with high coverage (300 - several 1000x) using Sureselect, which employs UMIs. I have sequenced the gene in brain tissue from diseased and non-diseased persons. Unfortunately, only brain tissue is available, so a brain to non-brain comparison isn't possible.
Now I'm wondering how I should perform the somatic variant analysis. As a start, I wanted to use Mutect2 and Varscan in tumor-only mode (I'm not working with tumors, but the situation is analogous) on individual samples and only "call" variants if they can be found by both tools to introduce some stringency. However, I've read in online discussions that tumor-only analyses without normals give poor results and are discouraged.
In theory, I could use my non-diseased samples as normals. However, I don't even know whether gene XYZ is normal in the normal samples - maybe it can be randomly mutated in both and thus not associated with my disease. Also, I'm not sure that it's a good approach to match diseased individual A with healthy individual B; my understanding is that one usually matches tumor from individual A with healthy tissue from individual A to take germline mutations into account.
In summary, these are my questions:
- is it advisable to run the tumor-only mode on every sample individually?
- is it admissable to use different individuals as matched normals?
- alternatively, could I build my own panel of normals, using data from non-diseased individuals? I have only got about 10 non-diseased, so this might not be a great PoN.
I'm also very open to other ideas. Thanks!
Edit: I've found this paper that lists a few tools for non-cancer data without matched controls Huang and Lee 2022