Group mutations in SNP data
1
0
Entering edit mode
7.4 years ago
nanana ▴ 120

I have some somatic SNP data for multiple tumour normal comparisons that I'm exploring in plots such as this

This plot shows the contribution each mutation in a given trinucleotide context makes to the total mutation load. For example I find 2155 somatic snvs across all samples, and 19 of these are A>C transversions in a trinucleotide context of AAA (top left of the plot), so this particular class of mutation contributes 0.009 (19/2155) of the total mutations.

As there are 12 possible mutation class A>G, A>C, A>T, G>C, G>T, G>A, C>A, C>G, C>T, T>A, T>C, T>G and for each mutation class there are 16 (2^4) possible trinucleotides e.g. A>G in an AAA context, I have plotted these separately.

However, most papers I see discussing the mutational spectrum (e.g. figure; paper) only refer to the following nucleotide changes: C>A, C>G, C>T, T>A, T>C, T>G. Why is this? This suggests that a C>A is directly equivalent to the complementary G>T? Is this really the case?

If so, should I simply lump C>A and G>T transversions together when plotting?

SNP genome • 1.7k views
ADD COMMENT
1
Entering edit mode
7.4 years ago
Samuel Brady ▴ 330

Yes, normally you take the reverse complement of the G>N and A>N mutations, since a biological mutational event inducing a C>A also causes a G>T mutation (on the other strand). I believe this convention was started by the Stratton group.

You should have a total of 96 possible mutation contexts (16 x 6).

By the way, you can feed your mutation context data (96 scores) into deconstructSigs to determine the strength of each of the 30 COSMIC mutation signatures. This is a really interesting area of research right now.

ADD COMMENT

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6