Question

Group mutations in SNP data

0

Entering edit mode

7.4 years ago

nanana ▴ 120

I have some somatic SNP data for multiple tumour normal comparisons that I'm exploring in plots such as this

This plot shows the contribution each mutation in a given trinucleotide context makes to the total mutation load. For example I find 2155 somatic snvs across all samples, and 19 of these are A>C transversions in a trinucleotide context of AAA (top left of the plot), so this particular class of mutation contributes 0.009 (19/2155) of the total mutations.

As there are 12 possible mutation class A>G, A>C, A>T, G>C, G>T, G>A, C>A, C>G, C>T, T>A, T>C, T>G and for each mutation class there are 16 (2^4) possible trinucleotides e.g. A>G in an AAA context, I have plotted these separately.

However, most papers I see discussing the mutational spectrum (e.g. figure; paper) only refer to the following nucleotide changes: C>A, C>G, C>T, T>A, T>C, T>G. Why is this? This suggests that a C>A is directly equivalent to the complementary G>T? Is this really the case?

If so, should I simply lump C>A and G>T transversions together when plotting?

SNP genome • 1.7k views

ADD COMMENT • link 7.4 years ago by nanana ▴ 120

score 1 · Answer 1 · 2017-07-11

Yes, normally you take the reverse complement of the G>N and A>N mutations, since a biological mutational event inducing a C>A also causes a G>T mutation (on the other strand). I believe this convention was started by the Stratton group.

You should have a total of 96 possible mutation contexts (16 x 6).

By the way, you can feed your mutation context data (96 scores) into deconstructSigs to determine the strength of each of the 30 COSMIC mutation signatures. This is a really interesting area of research right now.