Hi! I am actually looking at TCGA level3 RNASeqV2 data. My goal is to look at the DEGs (tumor vs. normal) and I'm looking at LUAD now.
I am using edgeR at the moment since the original rsem paper mentioned that those rsem can be processed by edgeR/ DESeq.
I have a couple questions that I was wondering if anyone might have any suggestion -
Does it make sense to include all the tumor samples available, including those that don't have the matching normal samples from the same participant, and analyze for the DEGs? What kind of normalization method would be recommended if I do so? Or can I just use the default normalization of edgeR?
I started out looking at only the matched TN and NT samples. Using the above I'm getting 5639 DEGs out of 20531 genes (FDR <=0.05, FC >=2) which seems like a lot? ( even a lot more if I don't use any FC filter)
There seems to be various discussion regarding what tools to use:
- http://seqanswers.com/forums/showthread.php?t=28515
- https://groups.google.com/forum/#!topic/rsem-users/H1cswrvvmPs
I wonder if anyone has more experienced in analyzing TCGA dataset has any thought as to whether it is OK to use EdgeR, or should I use some other tools like EBSeq for RNASeqV2 data?
Any suggestion is greatly appreciated. Thanks a lot in advance!
EdgeR and DESeq is Okay from my point of view for read count datasets
If the data are in RPKM, edgeR is a terrible choice. The manual makes it very clear why it needs raw observed read counts.
Once you have counts so edgeR (or Voom) are valid methods, you will also likely need to run a paired test if you really do have paired data. It's in the manual.