How to identify if certain recurrent SNPs in a given cancer are associated with the down regulation of a given gene?
1
0
Entering edit mode
7.0 years ago
JJ ▴ 710

Hi all,

I am looking for some tools to identify if certain recurrent SNPs (not SNPs in the gene itself but in other genes) in a given cancer are associated with the down regulation of a given gene.

So I have a cancer dataset comprising of SNPs for each patient (MAF) and expression data (RSEM RNA-seq data) for each patient. I have a particular gene of interest and I want to associate SNPs in other genes with its down regulation. Any ideas how to associate the two? Can anyone point me in the right direction?

Any advise is very much appreciated.

RNA-Seq SNP • 1.6k views
ADD COMMENT
1
Entering edit mode
7.0 years ago
Asaf 10k

You have so much noise in the system, you should take good care or you'll end up with nothing. A few questions you might want to ask yourself:

  1. Is all the expression data generated in the same way? Is there a batch effect?
  2. Do you have a reference tissue to compare expression to or are you just looking at the expression level in the tumor? How would you normalize the expression in either of the options?
  3. Are the SNPs cancer specific? Does it matter to you? (again, reference).

Your major goal is to "align" the data between patients, when you'll have a matrix of SNPs vs patients with data inside and a table of genes (transcripts?) vs patients with expression levels inside most of the work will be behind you and you'll just have to do some relatively simple statistics.

ADD COMMENT
0
Entering edit mode

Thank you very much for your reply!

Yes I am also worried that I will end up with nothing....

1) Generally it's high quality data - the data is generated the same way. I am using normalised data. No batch effect.

2) I have no normals - just expression levels in the tumors (RSEM). I am planning on using the median to define up/down-regulation.

3) The SNPs are cancer-specfic (somatic).

So the first step would be to identify "hotspots" - genes that are mutated multiple times in different patients. Then simply do a Fisher exact test to see if it's significant?

Ay other suggestions? Thanks!!

ADD REPLY
1
Entering edit mode

I would imagine you will need to group somatic mutations together in a reasonable way. Otherwise you will be restricted to certain highly prevalent driver genes which have very highly recurrent hotspots, such as V600 in BRAF or G12 in KRAS. Or based on this comment ("So the first step would be to identify "hotspots" - genes that are mutated multiple times in different patients."), are you just using all somatic mutations within driver genes? The latter will definitely have a mixture of passenger mutations that would add substantial noise to any association.

ADD REPLY
0
Entering edit mode

Thanks for your input. originally I was thinking of using all non synonymous SNPs. But yes, you are right. Do you have any suggestions how to do this? I read about MutSig - this appears to be a good option.

ADD REPLY
0
Entering edit mode

Sounds like your data is good. Why Fisher? Don't you want to use the actual expression levels for a t-test or Wilcoxon test?

ADD REPLY
0
Entering edit mode

Yes you are right. After a voom transformation of the RSEM values, I could do a t test, correct?

I was first thinking of generating a contingency table like this:

            low exp    high exp
mut            a         b
not mut        c         d

Do you think a t test would be the better choice here?

ADD REPLY

Login before adding your answer.

Traffic: 1377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6