Hi,
This is very possibly a layman question …..
I have a MAF file with sequencing data for lymphoma specimens. I have no data regarding the tumor purity of the samples. There are no matched normal samples. Germline mutations have been filtered out based on mapping to the 1000 genome. Each mutation is mapped to COSMIC and tagged as pathogenic, likely pathogenic or unknown.
I have read about the various tools for estimating sample purity from the sequencing data (e.g., CNVkit, THetA2, FACETS etc.).
However, I was wondering if there is an approach that uses the fact that the COSMIC mapped somatic mutations are supposed to be unique to the tumor cells in order to estimate the tumor purity and to normalize the values of the allele frequencies.
Thanks, E
This only excludes COMMON variants that are present in 1KG. Still, having a matched normal, you would identify thousands of germline mutations that are not covered by 1KG in a WGS sample. Without matched normal, there is no way to discriminate somatic from germline variants.
Correct. See the recent ISOWN paper, where they tried really hard to distinguish germline from somatic in tumor-only samples and still, lots of germline events slipped through.
Partly because they don't adjust for purity and copy number, as they state in the Discussion. With normal contamination, there are ways to discriminate somatic from germline. At least there are ways to calculate those probabilities accurately.
There are actually many germline variants in COSMIC, since a lot of them have never been validated. COSMIC mutations have a "confirmed somatic" field to distinguish truly somatic from questionable.