Question

Permutation Method In Cancer Data Set (Same As Mutsic)

1

Entering edit mode

11.3 years ago

imagineyd ▴ 70

I have one question about permutation method that used in TCGA paper (Somatic mutations affect key pathways in lung adenocarcinoma, 2008, Nature) and MutSic method.

In these papers, they applied permutation method for finding co-occurring or mutual exclusive genes. In the supplementary method, they well described about this method but I can't understand the difference between test1 and test2. Does anyone understand the difference?

Concurrence and mutual exclusion analysis We performed two slightly different permutation tests for mutation correlation between genes (i.e. concurrence or exclusion of mutations). In both tests, we take inter-individual differences of gene mutation prevalence into consideration. FDR corrections have been applied.

Test 1: we kept the number of samples mutated in a given gene and the number of genes mutated in a given sample the same as observed in the data and randomly permuted the observed mutations across samples and genes. For each permutation, we recorded the number of samples with concurrent and exclusive mutations (Xc and Xe, repectively) for each pair of genes and compared them with the numbers observed in the original data (Nc and Ne). We repeated this process 10,000,000 times and summarize the frequencies of Xc>=Nc and Xe>=Ne, respectively. These frequencies are used as empirical p-values under the null hypothesis (no correlation between genes).

Test 2: we kept the number of mutations in a given gene and sample the same as observed in the data and randomly permuted the observed mutations across samples and genes. The rest of the analysis is the same as above and we performed 1,000,000 permutations.

Thanks so much :)

• 3.6k views

ADD COMMENT • link updated 11.3 years ago by Malachi Griffith 20k • written 11.3 years ago by imagineyd ▴ 70

score 1 · Answer 1 · 2013-08-26

It's not clear whether you're referring to MuSiC or MutSig. MutSig is just a method for determining the significance of recurrently mutated genes. MuSiC is a suite of tools that includes an SMG test, but also includes other tools, like one for inferring the significance of correlation and mutual exclusivity.

The difference between the two statements seems to be in the restrictions that they placed on the permutation test. The difference is subtle, but has to do with the difference between counting mutations and counting mutated genes. These are different numbers. (you could have 25 samples with TP53 mutated, but they could be hit with 50 mutations, two per sample)

Test 1: 1) preserve the number of samples with a given gene mutated (if 23 samples have TP53 muts, this is maintained) 2) preserve the number of genes mutated in a given sample (if sample X has 50 mutated genes, this is maintained)

Test 2: 1) preserve the number of mutations in a given gene (if there are 50 TP53 mutations, this is maintained) 2) preserve the the number of mutations in a given sample (if sample X has 200 total mutations, this is maintained)