Sorry if this is a very basic question. Ive been following the methodology of Alexandrov et al.,2013 (see http://cancer.sanger.ac.uk/cosmic/signatures) to find the mutational signature of a cancer sample and have become confused about how mutational signatures are supposed to be displayed on a graph, relating to the normalising of mutation counts. From the cosmic website:
Mutational signatures are displayed and reported based on the observed trinucleotide frequency of the human genome, i.e., representing the relative proportions of mutations generated by each signature based on the actual trinucleotide frequencies of the reference human genome version GRCh37.
Do I need to count the frequency of trinucleotide repeats in the human genome and then somehow normalise to this? I am a bit confused about what this sentence means.
Hypothetical example for a C>A substitution type and ACA context;
If the mutation frequency in my sample is 1% and the trinucleotide frequency of ACA in the human genome is 3.5%, what do I do next? Many thanks
I am currently also confused by this statement. The way I currently interpret this is the following:
Each published signatures seems to sum up to 1 over the 96 dimensions. This, I assume, refers to the "relative proportion of the mutations generated by each signature". This leaves the normalization by the trinucleotide frequency: to ensure the normalization to 1 per signature still holds, I think this was done before the above.
I.e. given an un-normalized mutational signature S and the trinucleotide frequency T, the published signature would be: (S/T) / sum(S/T).