Hi all,
I am looking to identify DNA-level variations from a matched tumor-normal WES data. Specifically, I just want to know the variations in the tumor sample in relate to the reference genome, not the normal sample.
I have noticed two possible approaches here:
- Simply use a germline-variant caller to call variants from the tumor sample, or
- Call, separately, germline variations between reference and normal, and somatic variations between tumor and normal. The two callsets are then combined.
I'm well aware of the ploidy issues surrounding tumor samples and thus somatic callers are always separate algorithms. However, which is the better approach for my purpose?
Thanks in advance!
What is your goal? Are you trying to identify germline variants?
Hi @igor, the intention is to identify all variants regardless of source.
I would say somatic and germline variants are completely different analysis, so you can't really combine them.
For germline, you can call them in the N and use T as the validation sample.
TCGA PDAC is a nice paper where they discuss a lot of somatic and germline variants side by side: https://www.ncbi.nlm.nih.gov/pubmed/28810144
I understand this. The purpose of the variant calling in question, however, is not to study their biological significance. I plan to do some RNA-editing research, so I need to identify DNA-level variations to act blacklist positions.