Distribution of somatic mutation
1
0
Entering edit mode
7.8 years ago
CY ▴ 750

So... I got two questions.

1) Does somatic mutation follows poisson like germline mutation does?

2) We now use bayesian model to detect germline mutation. Does the mutation have to follow specific poisson distribution in order to use bayesian model? If it does not, can we use bayesian model for somatic mutation calling? say we sequencing 10000 tumors and use the mutation frequency on specific position as the prior probability of that position.

Can someone share some idea on this? really appreciate!

next-gen SNP • 1.8k views
ADD COMMENT
1
Entering edit mode
7.8 years ago
solo7773 ▴ 90

Though I may not know the right answer for you, I'd like to post my opinion as an answer.

1) According to my analysis of all somatic mutations of the latest release of ICGC data (landscape of somatic mutations, raw data as supplementary provided), poisson distribution doesn't apply to somatic mutation. Based on the number of mutations in the sample, it is likely to follow normal or Weibull distribution. Indeed, it is hard to say which distribution it is exactly related to.

Let's have a look at all somatic mutations of WGS of ICGC

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0    1506    3668   12450    8816  722400

Choose the data between 1st quantile and 3rd quantile to find the distribution it fits. The subset of data looks like below (x axis indicates sequencial order of individual)

distri across individual

After fitting distribution, the evaluation is as following,

fit

2) Based on my understanding of Bayes, your data doesn't need to follow a specific distribution. Bayes depends on prior to predict posterior so you do need to know the prior. Nowadays, there are so many excellent tools for somatic variation calling. If you are not developing novel methods, you are encouraged to use existing tools. They are reliable and widely used in the community. What's more, the category of somatic mutations contains different types, eg. substitution, indel, structural variation, etc. Are you going to detect all types? It seems you are going to call somatic mutations based on statistics information. My understanding is that current tools map cancer sequences and reference genome.

ADD COMMENT

Login before adding your answer.

Traffic: 1883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6