Hi all,
I am looking for some tools to identify if certain recurrent SNPs (not SNPs in the gene itself but in other genes) in a given cancer are associated with the down regulation of a given gene.
So I have a cancer dataset comprising of SNPs for each patient (MAF) and expression data (RSEM RNA-seq data) for each patient. I have a particular gene of interest and I want to associate SNPs in other genes with its down regulation. Any ideas how to associate the two? Can anyone point me in the right direction?
Any advise is very much appreciated.
Thank you very much for your reply!
Yes I am also worried that I will end up with nothing....
1) Generally it's high quality data - the data is generated the same way. I am using normalised data. No batch effect.
2) I have no normals - just expression levels in the tumors (RSEM). I am planning on using the median to define up/down-regulation.
3) The SNPs are cancer-specfic (somatic).
So the first step would be to identify "hotspots" - genes that are mutated multiple times in different patients. Then simply do a Fisher exact test to see if it's significant?
Ay other suggestions? Thanks!!
I would imagine you will need to group somatic mutations together in a reasonable way. Otherwise you will be restricted to certain highly prevalent driver genes which have very highly recurrent hotspots, such as V600 in BRAF or G12 in KRAS. Or based on this comment ("So the first step would be to identify "hotspots" - genes that are mutated multiple times in different patients."), are you just using all somatic mutations within driver genes? The latter will definitely have a mixture of passenger mutations that would add substantial noise to any association.
Thanks for your input. originally I was thinking of using all non synonymous SNPs. But yes, you are right. Do you have any suggestions how to do this? I read about MutSig - this appears to be a good option.
Sounds like your data is good. Why Fisher? Don't you want to use the actual expression levels for a t-test or Wilcoxon test?
Yes you are right. After a voom transformation of the RSEM values, I could do a t test, correct?
I was first thinking of generating a contingency table like this:
Do you think a t test would be the better choice here?