Hi all,
I need to perform GISTIC and MutSig analyses on a cohort of samples. Some of these samples have multiple biopsies taken at both A) same time point, B) different time points. I was wondering what the best way is to go about this situation for the two analyses I mentioned:
Specifically:
1) For MutSig, how would it affect the results if I took the union of all mutation calls from the different tissues per patient?
2) For GISTIC, is it even possible to take a "union" of the copy number calls?
3) Again for GISTIC, is it advisable to take an average of the copy number calls across all the samples per patient?
I wanted to know if there is a best practice to deal with such situations while performing the said analyses. Thanks!
There are several issues with merging data from different biopsies. When we talk about biopsies taken at the same time point the major is tumor inter- and intra- heterogeneity. When we talk about different time point it is tumor evolution - for example, molecular landscape of metastatic castrate-resistant prostate cancer is not similar to the landscape of stage I or II prostate cancer. 1A) MuSig analysis first of all aims at identifying driver genes. If gene is driver, not passenger, then we can expect that its mutation is early tumor evolution event and therefore its mutation would be present in all tumor subclones with major mutant allele fraction. If its mutation present in one subclone but absent in another this may hint that either this gene is not driver or this mutation call is just an artifact. Therefore it seems suitable to take into account only mutations present in all biopsies. I.e. not the union of mutations but intersection. In such case multiple biopsies just increase robustness of mutation detection and your analysis therefore. 1B) You definitely should not merge data if you expect that for the period between biopsies disease changed its molecular landscape. Empirical rule is that after 2 lines of chemotherapy disease significantly changes its molecular landscape. 2) You can do it for ex. starting at raw data. Just merge fastq files taking into account the average sample coverage (downsampling more represented sample if that would not drastically change read count). 3) same as 1A and 1B
I wonder what did you end up doing here? Would multiple related samples violate GISTIC assumptions?