I have some somatic SNP data for multiple tumour normal comparisons that I'm exploring in plots such as this
This plot shows the contribution each mutation in a given trinucleotide context makes to the total mutation load. For example I find 2155 somatic snvs across all samples, and 19 of these are A>C
transversions in a trinucleotide context of AAA
(top left of the plot), so this particular class of mutation contributes 0.009 (19/2155) of the total mutations.
As there are 12 possible mutation class A>G, A>C, A>T, G>C, G>T, G>A, C>A, C>G, C>T, T>A, T>C, T>G
and for each mutation class there are 16 (2^4) possible trinucleotides e.g. A>G
in an AAA
context, I have plotted these separately.
However, most papers I see discussing the mutational spectrum (e.g. figure; paper) only refer to the following nucleotide changes: C>A, C>G, C>T, T>A, T>C, T>G
. Why is this? This suggests that a C>A
is directly equivalent to the complementary G>T
? Is this really the case?
If so, should I simply lump C>A
and G>T
transversions together when plotting?