Question

Differential expression analysis for TCGA level 3 RNASeqV2 data?

3

Entering edit mode

10.1 years ago

cafelumiere12 ▴ 80

Hi! I am actually looking at TCGA level3 RNASeqV2 data. My goal is to look at the DEGs (tumor vs. normal) and I'm looking at LUAD now.

I am using edgeR at the moment since the original rsem paper mentioned that those rsem can be processed by edgeR/ DESeq.

I have a couple questions that I was wondering if anyone might have any suggestion -

Does it make sense to include all the tumor samples available, including those that don't have the matching normal samples from the same participant, and analyze for the DEGs? What kind of normalization method would be recommended if I do so? Or can I just use the default normalization of edgeR?
I started out looking at only the matched TN and NT samples. Using the above I'm getting 5639 DEGs out of 20531 genes (FDR <=0.05, FC >=2) which seems like a lot? ( even a lot more if I don't use any FC filter)
There seems to be various discussion regarding what tools to use:
- http://seqanswers.com/forums/showthread.php?t=28515
- https://groups.google.com/forum/#!topic/rsem-users/H1cswrvvmPs
I wonder if anyone has more experienced in analyzing TCGA dataset has any thought as to whether it is OK to use EdgeR, or should I use some other tools like EBSeq for RNASeqV2 data?

Any suggestion is greatly appreciated. Thanks a lot in advance!

RNA-Seq R edgeR • 7.6k views

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by cafelumiere12 ▴ 80

0

Entering edit mode

EdgeR and DESeq is Okay from my point of view for read count datasets

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

If the data are in RPKM, edgeR is a terrible choice. The manual makes it very clear why it needs raw observed read counts.

Once you have counts so edgeR (or Voom) are valid methods, you will also likely need to run a paired test if you really do have paired data. It's in the manual.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by ross.lazarus • 0

Ram · Accepted Answer · 2014-10-30

My personal opinion is that including all the samples would not be good Idea, you would have lot of variance and so would loose many DEGs. Once your matrix is ready go for quantile normalization then you should go for hierarchical clustering of samples with spearman correlation. Choose biggest (should have much more samples than other clusters) clusters (one from each patient and normal). Make these two groups and do DE analysis.
Yes, you are getting lot of DEGs , If you want to narrow down, then I have seen some publications where people take FDR threshold 0.01 also. But its okay 1/4 of total genes.